Passing Board Information to Your Device Driver (and Navigating the Kernel)

Published

Passing Board Information to Your Device Driver (and Navigating the Kernel)

In a previous article (https://mab-labs.com/parsing-the-device-tree-before-kernel/), I described how to pre-allocate memory away from the kernel to be used by your driver and how to use the device tree to know how much memory to allocate. The article failed to mention how the pre-allocation fit within the context of kernel bootup.  Specifically, it failed to answer what data structures we can use to ultimately accomplish this task. This article aims to answer this question.

While the crux of this article will be devoted to analyzing kernel code that is responsible for setting up our board and passing any relevant information to our driver, I will also show you the tools that I’ve used to help me navigate the kernel. This is important because you’re (most likely) not the first person to run into an issue when writing a driver or modifying the kernel in any way, and you can use someone else’s solution to fit your need, albeit with some probable modifications. Even if you’ve discovered that your issue is unique and you have a solution that has worked, it’s always best to see how others have implemented similar solutions so that you can follow their paradigm and increase the chance that your fix is incorporated into the mainline kernel.

The first tool that I use to navigate the kernel is Bootlin’s Elixir scrape of the Linux kernel (https://elixir.bootlin.com/linux/latest/source). Their tool is tremendously valuable as it allows me to enter “identifiers” in the search bar and the tool will spit out all definitions and references of that identifier; an identifier is either a variable or function name. Since this article will be specifically dealing with the 4.19 kernel, bookmark https://elixir.bootlin.com/linux/v4.19.89/source for the duration of this article. The second tool that I use is ripgrep (https://github.com/BurntSushi/ripgrep). Ripgrep is a faster and generally better alternative to grep. When using ripgrep, I make heavy use of their support for regular expressions to determine whether a hit is a declaration or definition of a function or an invocation. I believe that these two tools coupled with a local copy of the kernel source are all that are needed to properly navigate the kernel.

Getting back to the problem at hand, we need to understand how the kernel provides hooks to allow us to implement any custom callbacks that will be invoked during the kernel boot process. To start us off, let’s take a look at an existing example of the socfpga (recall, that is the “architecture” associated with an Altera/Intel FPGA with a built-in ARM core we were targeting last time). If we navigate to arch/arm/mach-socfpga/socfpga.c, we can see the following:

DT_MACHINE_START(SOCFPGA, "Altera SOCFPGA")
        .l2c_aux_val   = 0,
        .l2c_aux_mask  = ~0,
        .init_irq      = socfpga_init_irq,
        .restart       = socfpga_cyclone5_restart,
        .dt_compat     = altera_dt_match,
MACHINE_END 
DT_MACHINE_START(SOCFPGA_A10, "Altera SOCFPGA Arria10")
        .l2c_aux_val   = 0,
        .l2c_aux_mask  = ~0,
        .init_irq      = socfpga_arria10_init_irq,
        .restart       = socfpga_arria10_restart,
        .dt_compat     = altera_a10_dt_match,
MACHINE_END 

DT_MACHINE_START and MACHINE_END look like macros. DT_MACHINE_START is defined as:

#define DT_MACHINE_START(_name, _namestr)            \
static const struct machine_desc __mach_desc_##_name \
__used                                              \
__attribute__((__section__(".arch.info.init"))) = { \
        .nr            = ~0,                          \
        .name          = _namestr,
#endif

There are two takeaways here. First, the combination of DT_MACHINE_START macro, member initialization, and MACHINE_END macro creates a struct of type machine_desc for a specific processor-board combination (referred to as a “platform” or “machine”). Second, telling the compiler to put this struct in the section .arch_info.init by using the attribute keyword is a neat way to add support for any number of platforms. It allows the kernel to iterate over all of the supported machine configurations in code and matches the correct one to the configuration specified in the device tree without having to know how many exist a priori (we’ll cover how the linker script is used to allow the kernel to iterate over a variable number of functions in a future post).

We can take a look to see how init_irq, which is set to the socfpga_init_irq function, gets invoked by the kernel as an example. The idea is that we might be able to use another callback declared in the machine_desc struct, which is defined arch/arm/include/asm/mach/arch.h, to pre-allocate the memory we need. We can do this by finding all invocations of init_irq in the kernel using Bootlin’s cross-reference of the kernel and seeing that it gets called from init_IRQ in arch/arm/kernel/irq.c (if you have a different processor type, you’ll have to look through the appropriate directory):

void __init init_IRQ(void)
{
        int ret;
        if (IS_ENABLED(CONFIG_OF) && !machine_desc->init_irq)
            irqchip_init();
        else
            machine_desc->init_irq();
.
.
.
}

The function above simply calls init_irq if its implementation exists. init_IRQ gets called by start_kernel in init/main.c, and start_kernel ultimately gets called by assembly code, which is beyond the scope of this article (if you’d like to get a deeper understanding of the full boot process of the Linux kernel running on an ARM processor, I recommend reading https://www.kernel.org/doc/Documentation/arm/Booting).

Taking a look at the machine_desc struct definition, we can use the reserve callback to perform the pre-allocation we need. The call trace for the reserve callback is as follows:

We can see that the paradigm for ultimately calling init_irq previously is the same for calling the reserve callback in arm_memblock_init. Specifically, the kernel calls a machine-specific callback if it’s implemented:

/* reserve any platform specific memblock areas */
    if (mdesc->reserve)
        mdesc->reserve();

Now that we have a way to ultimately pre-allocate memory away from the kernel during bootup, we need to let our driver know where this area resides in memory. Looking back at the machine_desc struct, we could use the init­_machine callback. Ripgrepping through the kernel (unfortunately, Elixir’s cross-reference fails to pick up invocations of this callback throughout the kernel), we can see that the callback is called in customize_machine. We can see that customize_machine is invoked in arch/arm/kernel/setup.c (right below its definition):

arch_initcall(customize_machine);

What does arch_initcall mean? Well, searching for it through the kernel, we can see it’s defined in include/linux/init.h as:

#define arch_initcall(fn)       __define_initcall(fn, 3)

And _define_initcall is defined as (in the same header file):

#define __define_initcall(fn, id) \
    static initcall_t __initcall_##fn##id __used \
    __attribute__((__section__(".initcall" #id ".init"))) = fn;

The above code snippet is similar to the definition of the machine_desc struct we showed earlier. If we look for additional invocations of .initcall, we can see the following in the kernel linker script in include/asm-generic/vmlinux.lds.h:

#define INIT_CALLS_LEVEL(level)                     \
        __initcall##level##_start = .;              \
        KEEP(*(.initcall##level##.init))            \
        KEEP(*(.initcall##level##s.init))           \

#define INIT_CALLS                          \
        __initcall_start = .;                   \
        KEEP(*(.initcallearly.init))                \
        INIT_CALLS_LEVEL(0)                 \
        INIT_CALLS_LEVEL(1)                 \
        INIT_CALLS_LEVEL(2)                 \
        INIT_CALLS_LEVEL(3)                 \
        INIT_CALLS_LEVEL(4)                 \
        INIT_CALLS_LEVEL(5)                 \
        INIT_CALLS_LEVEL(rootfs)                \
        INIT_CALLS_LEVEL(6)                 \
        INIT_CALLS_LEVEL(7)                 \

The above snippet is essentially ordering certain function calls in the final binary. In our case, any function that is invoked by arch_initcall is part of the third group of initial calls in the kernel. Since these functions are in ascending order of their address in the final binary, the kernel simply uses “function sentinels” (function names appended with “begin” and “end”) to iterate through the functions (you can see this by searching your map file for .initcall3_start as an example). The kernel iterates through these functions with the following code snippet in init/main.c:

extern initcall_t __initcall_start[];
extern initcall_t __initcall0_start[];
extern initcall_t __initcall1_start[];
extern initcall_t __initcall2_start[];
extern initcall_t __initcall3_start[];
extern initcall_t __initcall4_start[];
extern initcall_t __initcall5_start[];
extern initcall_t __initcall6_start[];
extern initcall_t __initcall7_start[];
extern initcall_t __initcall_end[];

static initcall_t *initcall_levels[] __initdata = {
    __initcall0_start,
    __initcall1_start,
    __initcall2_start,
    __initcall3_start,
    __initcall4_start,
    __initcall5_start,
    __initcall6_start,
    __initcall7_start,
    __initcall_end,
};
.
.
.
.
.
static void __init do_initcall_level(int level)
{
    initcall_t *fn;

    strcpy(initcall_command_line, saved_command_line);
    parse_args(initcall_level_names[level],
           initcall_command_line, __start___param,
           __stop___param - __start___param,
           level, level,
           NULL, &repair_env_string);

    trace_initcall_level(initcall_level_names[level]);
    for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++)
        do_one_initcall(*fn);
}

Essentially, there is an array of functions that do_initcall_level iterates over. These functions are declared as external to the source file and are resolved to actual addresses in memory by the linker. do_initcall_level is called by do_initcalls, which in turn is called do_basic_setup, which is in turned called by setup_kernel, all in init/main.c. Now that we know which callback is best to use, we still need to let our driver know the location in memory of our pre-allocated memory (i.e. we need to write the function that will form the body of the init_machine callback). We can do something similar in the last article and use the device tree as the “glue” between the platform bring up and the driver. We can use the function of_platform_populate in drivers/of/platform.c. The prototype of the function is:

int of_platform_populate(struct device_node *root,
            const struct of_device_id *matches,
            const struct of_dev_auxdata *lookup,
            struct device *parent)

This function essentially associates any custom data to a node in the device tree. All that’s needed is to populate the of_dev_auxdata structure that’s passed to the function with the appropriate data. The definition of the structure is:

struct of_dev_auxdata {
    char *compatible;
    resource_size_t phys_addr;
    char *name;
    void *platform_data;
};

In summary, we identified two functions to put everything together. The reserve callback in the machine_desc struct can be used to pre-allocate memory away from the kernel. The init_machine callback can be used to pass any custom data structure during the board boot up process to the driver. We also identified two ways of navigating the kernel. The first method is to use ripgrep to search through a local copy of the kernel. The second is to use Bootlin’s cross-reference of the kernel.

I hope this information will help you write your own customization routines during the kernel boot up process for your own custom board. Feel free to reach out to me at mab@mab-labs.com if you run into any issues and for any of your embedded software needs.