What about the ac (Alan Cox) series of patches ?

  • What's the difference with Linus' kernels?
    Alan's kernel can be seen as a test bed for Linus' kernels. While Linus is very conservative and only applies obvious and well tested patches to the 2.4 kernel, Alan maintains a set of kernel patches that contains new concepts, more and/or newer drivers, and more intrusive patches. If the patches prove themselves stable, Alan submits them to Linus to include them into the official kernel. (Note that currently the -ac tree is essentially in limbo, mostly containing patches that need to be forwarded to Marcelo for 2.4. No -ac tree exists for 2.5)

    While the above is the intent of the -ac series, only the -ac series contains all the bugfixes that are posted on the kernel mailinglist(s).

  • Where do I get them ?
    Go to ftp.xx.kernel.org/pub/linux/kernel/people/alan/ where "xx" is your country code, e.g. "uk"

  • How do I apply them ?
    Example of how to patch from 2.4.2 -> 2.4.2-ac20. (Assuming linux-2.4.2.tar.bz2 and patch-2.4.2-ac20.bz2 have both been downloaded to the same directory.
    bzip2 -dc linux.2.4.2.tar.bz2 | tar xvf -
    cd linux
    bzip2 -dc ../patch-2.4.2-ac20.bz2 | patch -p1

  • I downloaded the .gz files instead of .bz2
    Same process, just different program to unpack.
    gzip -dc linux.2.4.2.tar.gz | tar xvf -
    cd linux
    gzip -dc ../patch-2.4.2-ac20.gz | patch -p1

  • But I already applied an ac patch !
    Then back it out first :

    bzip2 -dc patch-2.4.2-ac19.bz2 | patch -p1 -R
    bzip2 -dc patch-2.4.2-ac20.bz2 | patch -p1

  • Yuck. Why can't there be "incremental diffs" between ac revisions ?
    If you're lucky, there are, at http://www.bzimage.org/ or ftp://sunsite.icm.edu.pl/pub/Linux/kernel/incr/2.4



What is asmlinkage ?

The asmlinkage tag is one other thing that we should observe about this simple function. This is a #define for some gcc magic that tells the compiler that the function should not expect to find any of its arguments in registers (a common optimization), but only on the CPU's stack. Recall our earlier assertion that system_call consumes its first argument, the system call number, and allows up to four more arguments that are passed along to the real system call. system_call achieves this feat simply by leaving its other arguments (which were passed to it in registers) on the stack. All system calls are marked with the asmlinkage tag, so they all look to the stack for arguments. Of course, in sys_ni_syscall's case, this doesn't make any difference, because sys_ni_syscall doesn't take any arguments, but it's an issue for most other system calls. And, because you'll be seeing asmlinkage in front of many other functions, I thought you should know what it was about.


Where do I begin ?

A common question asked by a newbie is "I've just unpacked this huge tarball, and I want to help out, but I don't know where to start!"

It may seem daunting to be confronted with such a large amount of source code, but bear in mind, that very few kernel hackers understand every area of the kernel tree.

People specialise. If you're interested in TCP/IP, you'll not be needing to read the filesystem code. Figure out what it is you want to be working on, and focus on that.

Linux is a professional-quality kernel. This makes it difficult to come up with small "student projects" by which you can learn: often features are already implemented, and at a level that requires a good level of understanding before you can hack on them. However, there are several practical things and useful things you can do until you have learned enough to really start hacking :

Test and benchmark
New code is constantly evolving, benchmark it. You will certainly notice some odd behaviours: there is your impetus to understand where the behaviour is coming from. Profile things, trace it (e.g. LTT), see if you can work out what might be causing problems. You'll learn the code by accident. Try out experimental patches posted to linux-kernel and trees like mjc's. Try and understand what a particular patch does, and how it does it.
Document
Sounds boring ? Maybe, but you'll be doing everybody a favour, not least yourself. Forcing yourself to explain things crystalises your own understanding. Documentation of behaviour requires you to understand code. You'll find code a lot easier to read if you are directed to answering a specific question. Write articles for kernelnewbies and get them peer-reviewed in the IRC channel. Identify inaccuracies in the current man-pages, and fix them. Add source docs to the kernel source.
Kernel janitors
Kernel janitors is a project to fix mis-use of kernel APIs as the code mutates. This can quickly get pretty interesting. An educational talk on the project can be read here.



How do I compile a kernel ?

(These instructions assume we are installing version 2.4.0 of the kernel, replace all instances with the version you are trying to build. These instructions are also x86-specific; other architecture's build procedures may differ.)

  • Download your tarball from ftp.XX.kernel.org where XX is your country code. If there isn't a mirror for your country, just pick a near one.
  • Unpack the tarball in your /usr/src directory
       bzip2 -dc linux-2.4.0.tar.bz2 | tar xvf -
    (Replace bzip2 with gzip if you downloaded the .gz)
  • cd into the linux directory. You'll now need to configure the kernel to select the features you want/need. There are several ways to do this..
       a. make config
          Command line questions.
       b. make oldconfig
          (Useful only if you kept a .config from a previous
          kernel build)
       c. make menuconfig
          (ncurses based)
       d. make xconfig
          (TCL/TK based X-Windows configuration)
    
  • Now we can build the kernel, but first we have to build the dependencies.
       make dep
       make bzImage
    
  • Wait. When its finished, make any parts you may have selected to be modular.
       make modules
    

  • Become root to be able to install modules and kernel. Everything before this point can and should be done as a normal user, there is really no need to be root to compile a kernel. It's actually a very bad idea to do everything as root because root is too powerful, one single mistake is enough to ruin your system completely.
  • Intall the modules.
       make modules_install
    
  • Install the new kernel..
       cp arch/i386/boot/bzImage /boot/vmlinuz-2.4.0
       cp System.map /boot/System.map-2.4.0
    
  • Edit /etc/lilo.conf, and add these lines...
       image = /boot/vmlinuz-2.4.0
       label = 2.4.0
    
    Also copy your root=/dev/??? line here too.


  • Run /sbin/lilo, reboot, and enjoy. If you get modversion problems (symbols ending in _Rxxxxxxxx), have a look at this question in the linux-kernel mailing list FAQ to solve the problem.
Still not getting it? Try this more indepth tutorial


Does ... ever come here ?

Linus Torvalds.

Please, be serious...

Alan Cox.

Alan visits from time to time.


How do I apply a patch ?

The answer to this depends on how the patch was created, specifically to which directory the patch is done from. In general though, patches are done in the root of the source tree (/usr/src/linux), and the following assumes this.

For example's sake, you have unpacked a tarball of Linux 2.4.0, and you want to apply Linus' patch-2.4.1.bz2, which you have placed in /usr/src. Do the following :

	cd /usr/src/linux
	bzip2 -dc /usr/src/patch-2.4.1.bz2 | patch -p1 --dry-run
We used the --dry-run option to check that the patch applies cleanly. This can be a life-saver sometimes as it can be a real pain to back out a partially-applied patch. The -p1 option strips off part of the diff file's pathnames for each changed file (see the patch(1) manpage for more details). Now you've checked that it should apply cleanly, do :
	bzip2 -dc /usr/src/patch-2.4.1.bz2 | patch -p1
to actually apply it. You're done !
This is actually simple with Linus' standard patches, as you can use the script linux/scripts/patch-kernel to automatically do the patches for you.

The situation with other patches is not always so simple. For example, Linus' pre patches (found in pub/linux/kernel/testing) are not incremental, That is pre10.bz2 must be applied on top of the tarball of the previous full release kernel. Eg, patch-2.4.8-pre2 goes on top of an unpacked 2.4.7 tarball, *not* on top of a patched 2.4.8-pre1 kernel. If you have a 2.4.8-pre1 kernel, you can get back to 2.4.7 by following the section 'Reversing a patch' below.

Alan Cox's ac patches (pub/linux/kernel/people/alan/) follow the same method, unless you get the incremental patches from bzimage.org.

Occasionally you may want to test a patch from linux-kernel or similar. Generally these will be incremental against the named version (so, say, 2.4.0-test1-ac22-hosedmm.diff should be applied against 2.4.0-test1-ac22), and relative to the root. You may need to play with the -p option.

Reversing a patch

You've applied several patches, and now you want to remove them. Simply use the -R option to patch, with the same patch file, to back out the patch (alas, the patch(1) manpage is less than clear on this).


Who can I find on #kernelnewbies ?

,
Real name Nick Kernel responsibility
Anton Altaparmakov AntonA ntfs
Arjan van de Ven arjan kHTTPd, Powertweak
Andre Hedrick ata IDE guru
Jens Axboe axboe CDROM/DVD layer
Ralf Baechle Bacchus Linux-MIPS
Ben LaHaise bcrl Memory management
Dave Jones davej2.5-dj tree maintainer., Powertweak, random hacks
Erik Mouw erikm ARM Linux, SA1100-Linux
f00f f00f Larting, jumping, and logging
Greg Kroah gregkh USB
Christoph Hellwig hch Filesystems, kbuild, kernel cleanup
Jeff Dike jdike User Mode Linux
Jeff Garzik jgarzik Network drivers, PCI, kernel cleanup
lxrbot lxrbot The channel oracle. Ask it for definitions and uses in kernel source, or query its factoid database.
Thiago Rondon maluco Random hacking
Marcelo W. Tosatti marcelo 2.4 maintainer
Michael J. Cohen mjc Maintainer of -mjc kernel tree
John Levon movement oprofile, random minor hacking
Fabio O. Leite olive drbd, High Availability, heartbeat
Daniel Phillips phillips TUX2 filesystem, ext2 improvement, memory management hacking
Juan Quintela quintela Memory management
Rik van Riel riel Memory management
Russell King rmk ARM Linux
Tigran Aivazian tigran Random hacking
Momchil Velikov velco VM hacker etc.
Alexander Viro viro VFS guru
William Lee Irwin wli VM hacking, bootmem, more



Why do a lot of #defines in the kernel use do { ... } while(0)?

There are a couple of reasons:

  • (from Dave Miller) Empty statements give a warning from the compiler so this is why you see #define FOO do { } while(0).
  • (from Dave Miller) It gives you a basic block in which to declare local variables.
  • (from Ben Collins) It allows you to use more complex macros in conditional code. Imagine a macro of several lines of code like:
    #define FOO(x) \
            printf("arg is %s\n", x); \
            do_something_useful(x);
    
    Now imagine using it like:
            if (blah == 2)
                    FOO(blah);
    This interprets to:
            if (blah == 2)
                    printf("arg is %s\n", blah);
                    do_something_useful(blah);;
    
    As you can see, the if then only encompasses the printf(), and the do_something_useful() call is unconditional (not within the scope of the if), like you wanted it. So, by using a block like do{...}while(0), you would get this:
            if (blah == 2)
                    do {
                            printf("arg is %s\n", blah);
                            do_something_useful(blah);
                    } while (0);
    
    Which is exactly what you want.
  • (from Per Persson) As both Miller and Collins point out, you want a block statement so you can have several lines of code and declare local variables. But then the natural thing would be to just use for example:
      #define exch(x,y) { int tmp; tmp=x; x=y; y=tmp; }
    
    However that wouldn't work in some cases. The following code is meant to be an if-statement with two branches:
      if(x>y)
        exch(x,y);          // Branch 1
      else  
        do_something();     // Branch 2
    
    But it would be interpreted as an if-statement with only one branch:
      if(x>y) {                     // Single-branch if-statement!!!
        int tmp;            // The one and only branch consists
        tmp = x;            // of the block.
        x = y;
        y = tmp;
      }
      ;                             // empty statement
      else                  // ERROR!!! "parse error before else"
        do_something();
    
    The problem is the semi-colon (;) coming directly after the block.

    The solution for this is to sandwich the block between do and while(0). Then we have a single statement with the capabilities of a block, but not considered as being a block statement by the compiler.

    Our if-statement now becomes:
      if(x>y)
        do {
          int tmp;
          tmp = x;
          x = y;
          y = tmp;
        } while(0);
      else
        do_something();
    



How does get_current() work ?

static inline struct task_struct * get_current(void)
{
        struct task_struct *current;
        __asm__("andl %%esp,%0; ":"=r" (current) : "0" (~8191UL));
        return current;
}

get_current() is a routine for getting access to the task_struct of the currently executing task. It uses the often confusing inline assembly features of GCC to perform this, as follows :

| __asm__(

This signifies a piece of inline assembly that the compiler must insert into its output code. The __asm__ is the same as asm, but can't be disabled by command line flags.

| "andl %%esp,%0

"%%" is a macro that expands to a "%".
"%0" is a macro that expands to the first input/output specification.

So in this case, it takes the stack pointer (register %esp) and ANDs it into a register that contains 0xFFFFE000, leaving the result in that register.

Basically, the task's task_struct and a task's kernel stack occupy an 8KB block that is 8KB aligned, with the task_struct at the beginning and the stack growing from the end downwards. So you can find the task_struct by clearing the bottom 13 bits of the stack pointer value.

| ; "

The semicolon can be used to separate assembly statements, as can the newline character escape sequence ("\n").

| :"=r" (current)

This specifies an output constraint (all of which occur after the first colon, but before the second). The '=' also specifies that this is an output. The 'r' indicates that a general purpose register should be allocated such that the instruction can place the output value into it. The bit inside the brackets - 'current' - is the intended destination of the output value (normally a local variable) once the C part is returned to.

| : "0" (~8191UL));

This specifies an input constraint (all of which occur after the second colon, but before the third). The '0' references another constraint (in this case, the first output constraint), saying that the same register or memory location should be used for both. The '~8191UL' inside the brackets is a constant that should be loaded into the register allocated for the output value before using the instructions inside the asm block.

See also the gcc info pages, Topic "C Extensions", subtopic "Extended Asm".

(Mostly courtesy of David Howells of Redhat).


How do I compile a module ?

Kernel modules must be compiled in a certain way in order to build successfully about the kernel.
The average module compile line will look something like this on 2.4 :

gcc -o mymod.o -Wall -W -O2 -DMODULE -D__KERNEL__ -I/lib/modules/`uname -r`/build/include -c my_mod.c

You must use the version of gcc you compiled the kernel with.
You must specify -O2 in order to inline necessary code.
You must make sure to build against the real kernel headers of the kernel you're compiling against. Using the (default) /usr/include/linux and friends is not good enough. The above example assumes you're building against the running kernel, if not alter the path to the -I include option.

MODVERSIONS

Modversions is the versioning system for kernel exported symbols. To build against it :
  • You must add -DMODVERSIONS to the compile line
  • The first include must be <linux/module.h>
  • If you are building a module from multiple source files, you must #define __NO_VERSION__ in all but one of the .c source files. Note this is not necessary with recent kernel versions.
  • You must not include modversions.h directly if your module is included in the kernel source. If you are buiding externally, include the correct modversions.h only when module versions is enabled (it is best to add it to the compile line as the kernel does)

Please also read http://www.tux.org/lkml/, especially questions 8-7 and 8-8

The proper way to build modules

The reliable, future-proof, way to build external modules is to leverage the kernel's build system to do the hard work for you. Use autoconf or a similar mechanism to discover the kernel source tree (defaulting to the /lib/modules path above). Read Documentation/kbuild/ to find out what your Makefile should look like for the module build itself, and then arrange to call make from the kernel base dir, setting SUBDIRS as necessary. For example, if your generated Makefile is in /home/luser/src/mymodule-0.1/module/, and the user's kernel source tree is /usr/src/linux-2.5, then the makefile fragment might look like :

kernel_module:
   $(MAKE) -C /usr/src/linux-2.5 SUBDIRS=/home/luser/src/mymodule-0.1/module/ modules

This will set all the flags you need correctly in a future-proof fashion by using the kernel build machinery itself to generate the module. You can find a very simple example for 2.6 kernels here : sillymod.tar.gz. It builds with the standard "./configure; make; make install"


What does an lsmod count of -1 mean ?

All it means is that the module has declared a ->can_unload() function.

Normal modules are reference counted via MOD_INC_USE_COUNT and friends. Modules with a can_unload() function, however, have decided to manage use of the module via the function, which returns -EBUSY when the module is not unloadable.

A value of -1 means that lsmod is not able to determine the unloadability of the module, and the only way to find out is to try rmmod.


What's the difference between extern and static inline ?

Let Linus explain :

 - "static inline" means "we have to have this function, if you use it
   but don't inline it, then make a static version of it in this
   compilation unit"

 - "extern inline" means "I actually _have_ an extern for this function,
   but if you want to inline it, here's the inline-version"

But also see Inline Functions in C.


What is System.map ?

System.map is a file (produced via nm) containing symbol names and addresses of the linux kernel binary, vmlinux.

Its primary use is in debugging. If a kernel "oops" message appears, the utility ksymoops can be used to decode the message into something useful for developers. ksymoops makes use of the System.map to map PC values to symbolic values. Note that 2.5 kernels have an in-kernel oops decoder called kksymoops, which does not need System.map

You may get warnings about your System.map being out of date. This won't affect normal running but its best to keep a copy around if there is a kernel bug / hardware failure. Note that ps l uses System.map to determine the WCHAN field (you can specify a map file with the PS_SYSTEM_MAP environment variable). The utilities look in a set of standard places for this file like /boot/System.map and /usr/src/linux/System.map


What's going on with the kernel headers ?

On any distribution, there are two sets of kernel headers :

System kernel headers

These are the kernel headers actually used by the system. These are the headers you compile user-space utilities against. They must be installed to compile anything in userspace.

The headers are usually found in /usr/include/asm and /usr/include/linux. They are copies and should never be replaced (unless you are doing a C library upgrade). These headers contain compatibility code etc. to allow them to be used with a variety of different running kernels, and are conceptually part of the glibc package. They can often be found in the kernel-headers or libc6-dev RPM/package.

Kernel source headers

These are the kernel header files that are part of the kernel source package. They should never be used for compiling user-space programs. Old Linux distributions often made /usr/include/linux and /usr/include/asm symlinks to the right parts of the kernel source tree installed in /usr/src/linux. This is the wrong thing to do - userspace programs must use copies of the kernel headers, suitably modified.

Conversely, when compiling the kernel, or kernel modules, these headers must be used. This is important when compiling externally packaged modules - the module build should look in the right place for the headers (by e.g. adding -I/lib/modules/`uname -r`/build/include).

Read Linus' explanation of the situation.


What major/minor does XXX have ?

Check devices.txt.


What are the various kernel trees for ?

-ac
Maintainer: Alan Cox
Pending patches for sending to Marcelo (for 2.4 series), and extra add-ons, fixes. etc.
-mm
Maintainer: Andrew Morton
Fancy new features and fixes with a focus on VM hacks.
-aa
Maintainer: Andrea Arcangeli
VM updates, a multitude of fixes and various improvements from Andrea.
-dj
Maintainer: Dave Jones
Forward ports of 2.4 bugfixes to 2.5 series, plus some other bits. (a slightly less bloody bleeding edge)
-ck
Maintainer: Con Kolivas
A stable 2.4 based patchset with a focus on performance tweaks to the scheduler and vm, with specific tuning for the desktop to improve system responsiveness.
-osdl
For data center or carrier grade linux, tuning especially for large machines and high database performance.
-rmap
Maintainer: Rik van Riel
Rmap has a reverse mapping from page frames to virtual mappings mostly in order to make a more predictable VM, to get rid of some worst case VM behaviours and smooth things out. The reverse mappings provide infrastructure to make a more flexible VM possible ... which means that VM strategies in -rmap often change.



How do I intercept system calls ?

Use something like Linux Trace Toolkit probably.

There is also a horrible hack based on modifying entries in the system call table. This is strongly unrecommended - it is not safe against module unloading, it is not architecture independent, and it is just ugly anyway.

Having said that, it seems it's a common task for those learning their way around kernel hacking. Checkout syscalltrack module for some code that actually does this.

Basically each point value in the global sys_call_table is modified to point to a new address supplied by the kernel module. In this way when the process calls a system call, it will end up in your routine. You can then call the old value saved from the system call table to actually process the meat of the request after collecting whatever info you need.

This fails horribly for the execve system call, and there's a very good reason for this. Let's look at the prototype of sys_execve() :

asmlinkage int sys_execve(struct pt_regs regs)

Note the argument - that is not a pointer ! Your attempt to intercept sys_execve in the same is not going to work. This argument indicates that the process's registers have been saved on the stack. Code inside sys_execve actually modifies these stack locations to place the PC value register at the start of the new executable - so you must let the code access the original point in the stack !

For example code that does the modification of the registers, see start_thread() called from load_elf_binary().

The simplest way to get around this problem is by calling do_execve() instead of the saved old sys_execve pointer value, duplicating the kernel's sys_execve() code. Ugly huh ? Please don't ever do this in real code. If you want to provide some code in a module that kernel code needs to call, provide a hook in the kernel code as a patch, then a module on top of that (an example of this is sys_nfsservctl()).

Note that Linus removed the export of sys_call_table in 2.5 kernels.


Can I use library functions in the kernel ?

System libraries (such as glibc, libreadline, libproplist, whatever) that are typically available to userspace programmers are unavailable to kernel programmers. When a process is being loaded the loader will automatically load any dependent libraries into the address space of the process. None of this mechanism is available to kernel programmers: forget about ISO C libraries, the only things available is what is already implemented (and exported) in the kernel and what you can implement yourself.

Note that it is possible to "convert" libraries to work in the kernel; however, they won't fit well, the process is tedious and error-prone, and there might be significant problems with stack handling (the kernel is limited to a small amount of stack space, while userspace programs don't have this limitation) causing random memory corruption.

Many of the commonly requested functions have already been implemented in the kernel, sometimes in "lightweight" versions that aren't as featureful as their userland counterparts. Be sure to grep the headers for any functions you might be able to use before writing your own version from scratch. Some of the most commonly used ones are in include/linux/string.h.

Whenever you feel you need a library function, you should consider your design, and ask yourself if you could move some or all the code into user-space instead.


Are there any good IDEs? How do I handle all this code?

When dealing with a source base as large as the kernel, it certainly helps to have software tools to help understand how the pieces fit together. Perhaps the most important tool is a good programmers's text editor. Popular choices are emacs and any vi clone, such as vim. Generally, text editors written for programmers are programable and have features such as syntax highlighting, text folding, brace matching, and easy integration with source management tools, such as make(1), cvs(1), text reformatting, man page lookups, and more.

Most popular is a tool to quickly find uses, definitions, and declarations, of C symbols. grep(1) is almost always available, and the more powerful version, egrep(1), is very useful to know. But grep(1) requires searching every file on every lookup. Tools such as cscope, freescope, etags, ctags, and idutils build databases to use when searching for C symbols. Each has their own idiosyncrasies and features. Some integrate better with your text editor of choice. (Look especially for plugins to help with integration.)

cgvg is another option, though it doesn't appear to use a database to speed searches.