Parsing the Linux Device Tree Before the Kernel


Parsing the Linux Device Tree Before the Kernel

Over the past few weeks, I’ve been working on updating a Linux device driver for one of my clients to add support for overlay windows in their display controller, which resides in an FPGA. The initial version of the driver had one major inefficiency; a thorough understanding of this deficiency requires a brief description of the system design. The display controller and the driver were both targeted to run on an Intel/Altera Cyclone V, or sometimes referred to as “socfpga”. The Cyclone V consists of FPGA fabric, which is referred to as the programmable logic (PL), and an ARM core, which is referred to as the processing system (PS) and contains the Linux operating system. The “control plane” interface between the display controller, which resides in the PL, and the Linux device driver is AXI lite; the “data plane” interface between the display controller and the Linux device driver is traditional AXI. All that means is that there is a memory map interface between the ARM core/Linux and the display controller/FPGA (Note: In implementations where the ARM core/processor and the FPGA are not in the same address space (i.e. silicon die), some sort of bridge is usually necessary in the FPGA fabric to the allow the ARM processor to communicate with the FPGA core). I was given the address at which the display controller resided in the PL, and I just needed to reference this address in the device tree, remap it to a kernel virtual memory address, and perform register writes into the display controller by writing to an address offset from the mapped base address, shown below:

clcd: clcd@ff3c2000 {
      compatible = "altr,clcd";
      reg = <0xff3c2000 0x1000>;
      interrupts = <0 13 0x4>;
      status = "disabled";
      clocks = <&h2f_usr1_clk>;
fbi->mmio_base = ioremap(r->start, resource_size(r));
static inline void lcd_writel(struct driver_info *fbi, unsigned int off, unsigned long val)
    writel(val, fbi->mmio_base + off);

In the above snippets of code, the first portion shows the entry in the device tree. The first element of reg = <0xff3c200 0x1000> in between the angle brackets is a physical memory location where the address space of the display controller core begins, and the second element is how much memory should be allocated for the address space. The second snippet above shows the remapping of the physical address into a kernel virtual address. Finally, the third portion shows the function to perform a write to a particular register of the core.

While the standard kernel mechanisms to interface with memory mapped devices were sufficient to perform register reads and writes with the display controller, this did not suffice for the framebuffer memory (recall that the framebuffer contains the pixel data that is to be drawn on the display). This was for a few reasons. First, performing an ioremap gives the driver a kernel virtual address, and considering that the display controller is an actual device that doesn’t have access to the kernel virtual memory address space, we can’t use ioremap (the display controller must directly access the physical memory where the framebuffer data resides). Second, we could have simply performed a kmalloc to get a physically contiguous block of memory. However, the upper bound on the amount of memory that kmalloc can provide is simply the page size * MAX_ORDER. Since MAX_ORDER is a kernel parameter that is set during build time, the end user of the driver would require a different kernel build for a different panel resolution. Of course, this implementation would have been a disaster. For every end-user, who may possibly require different display resolutions, we would have needed to calculate the required MAX_ORDER to satisfy the resolution of the display, and either build the kernel ourselves or instruct the end-user on this process (instead of simply changing an entry in the device tree). If I were a client and heard this from a vendor, I would have passed on the driver.

The alternative is a neat feature that exists in the memory management subsystem for ARM Linux. The function, called arm_memblock_steal, sets aside memory away from the kernel for any driver to use.  Now, the original driver implementation pre-allocated an 8 MB buffer for framebuffer data. This was because, at most, the framebuffer may need to hold 1920 x 1080 pixels x 32 bits per pixel /  8 bits per byte, since this is the highest resolution and bits per pixel configuration supported by the display controller. While this was not as destructive as simply increasing MAX_ORDER (which, at a high level would have helped achieve the same outcome but may also have implications throughout the entire kernel instead of a more targeted approach with arm_memblock_steal), it was still inefficient. The driver was stealing valuable memory away from the kernel that it may not ultimately need for framebuffer data. For example, if a user only needed to work with a display that had a resolution of 480 x 240 pixels with 32 bits per pixel, only 450 KB would be needed for framebuffer data, and the remaining 7.5 MB would be inaccessible to the rest of the kernel.

To alleviate this inefficiency, I thought to use the device tree. I’ve seen other BSPs have an entry for the resolution in the device tree for the display driver, and reasoned that I could do the same. The end-user of the driver would not be changing the display while their system is booted and the kernel is running, so my assumption seemed valid. However, because arm_memblock_steal must take place during machine bootup, before all the drivers have been probed and before the kernel had actually parsed the device tree, I couldn’t query the device using the traditional methods leveraged when writing a device driver (i.e. using the of_get_propertyfunctions). I was stumped. Nonetheless, I knew that others definitely have had this problem and the kernel most definitely has a solution. I searched the kernel source and found a really cool function called:

int __init of_scan_flat_dt(int (*it)(unsigned long node,
                           const char *uname, int depth, 
                           void *data), void *data)

This function iterates over the entire flat device tree, parses each node, and passes the parsed node as an argument to a function that’s defined by the author of the driver. A pointer to the custom function is the first argument to of_scan_flat_dt. This custom function should return a nonzero value if the desired node has been found, and zero otherwise. This way, if the custom function determines that the desired node has been found, it can use the return value to instruct of_scan_flat_dt to stop iterating over the device tree. In addition to simply searching for whether the node exists, I also performed the requisite checks on the resolution string, calculated the amount of memory needed for framebuffer data, allocated the memory away from the kernel, and set the start address of the physically address allocated away from the kernel to a static variable. This was ultimately passed to the driver during via a custom data structure set during the rest of the boot process (more on that in a later post!).

In summary, the kernel has some neat trickery that will allow you to parse the device tree before even the kernel has done itself! However, I would advise that you keep the parsing simple and caution against doing operations that are too involved. It could slow down the boot process and cause unintended consequences as the kernel in still in its fetal state at this point.

For all of your Linux kernel needs, including optimization and custom drivers, please contact us at