let me start: 4/1/07

Monday, April 30, 2007

IPsec (IP Security)

IPsec (IP security) is a suite of protocols for securing Internet Protocol (IP) communications by authenticating and/or encrypting each IP packet in a data stream. IPsec also includes protocols for cryptographic key establishment.

Summary

IPsec protocols operate at the network layer, layer 3 of the OSI model. Other Internet security protocols in widespread use, such as SSL and TLS, operate from the transport layer up (OSI layers 4 - 7). This makes IPsec more flexible, as it can be used for protecting both TCP- and UDP-based protocols, but increases its complexity and processing overhead, as it cannot rely on TCP (OSI layer 4) to manage reliability and fragmentation.

Modes

There are two modes of IPsec operation: transport mode and tunnel mode.

Transport mode

In transport mode only the payload (message) of the IP packet is encrypted. The routing is intact since the IP header is neither modified nor encrypted; however, when the Authentication Header is used, the IP addresses cannot be translated, as this will invalidate the hash value. The transport and application layers are always secured by hash so they cannot be modified in any way (for example by translating the port numbers). Transport mode is used for host-to-host communications.

A means to encapsulate IPsec messages for NAT traversal has been defined by RFC documents describing the NAT-T mechanism.

Tunnel mode

In tunnel mode, the entire IP packet is encrypted. It must then be encapsulated into a new IP packet for routing to work. Tunnel mode is used for network-to-network communications (secure tunnels between routers) or host-to-network and host-to-host communications over the Internet.

Security architecture

IPsec is implemented by a set of cryptographic protocols for (1) securing packet flows and (2) internet key exchange. Of the former, there are two.

Current status as a standard

IPsec is a mandatory part of IPv6, and is optional for use with IPv4. While the standard is designed to be indifferent to IP versions, current widespread deployment and experience concerns IPv4 implementations. IPsec protocols were originally defined by RFCs 1825–1829, published in 1995. In 1998, these documents were obsoleted by RFCs 2401–2412. 2401–2412 are not compatible with 1825–1829, although they are conceptually identical. In December 2005, third-generation documents, RFCs 4301–4309, were produced. They are largely a superset of 2401–2412, but provide a second Internet Key Exchange standard. These third-generation documents standardized the abbreviation of IPsec to uppercase “IP” and lowercase “sec”.

It is unusual to see any product that offers RFC1825–1829 support. “ESP” generally refers to 2406, while ESPbis refers to 4303.

Design intent

IPsec was intended to provide either transport mode: end-to-end security of packet traffic in which the end-point computers do the security processing, or tunnel mode: portal-to-portal communications security in which security of packet traffic is provided to several machines (even to whole LANs) by a single node.

IPsec can be used to create Virtual Private Networks (VPN) in either mode, and this is the dominant use. Note, however, that the security implications are quite different between the two operational modes.

End-to-end communication security on an Internet-wide scale has been slower to develop than many had expected. Part of the reason is that no universal, or universally trusted, Public Key Infrastructure (PKI) has emerged (DNSSEC was originally envisioned for this); part is that many users understand neither their needs nor the available options well enough to promote inclusion in vendors' products.

Since the Internet Protocol does not inherently provide any security capabilities, IPsec was introduced to provide security services such as:

Encrypting traffic (so it cannot be read by parties other than those for whom it is intended)
Integrity validation (ensuring traffic has not been modified along its path)
Authenticating the peers (ensuring that traffic is from a trusted party)
Anti-replay (protection against replay of the secure session)

Technical details

Authentication Header (AH)

The AH is intended to guarantee connectionless integrity and data origin authentication of IP datagrams. Further, it can optionally protect against replay attacks by using the sliding window technique and discarding old packets. AH protects the IP payload and all header fields of an IP datagram except for mutable fields, i.e. those that might be altered in transit. In IPv4 mutable, and therefore unauthenticated, IP header fields include TOS, Flags, Fragment Offset, TTL and Header Checksum. AH operates directly on top of IP using IP protocol number 51. The IP protocol doesn't need IPSec by itself.

An AH packet diagram:

0 - 7 bit	8 - 15 bit	16 - 23 bit
Next Header	Payload Length	RESERVED
Security Parameters Index (SPI)
Sequence Number
Authentication Data (variable)

Field meanings:

Next Header: Identifies the protocol of the transferred data.
Payload Length: Size of AH packet.
RESERVED: Reserved for future use (all zero until then).
Security Parameters Index (SPI): Identifies the security parameters in combination with IP address which then identify the Security Association implemented with this packet.
Sequence Number: A monotonically increasing number, used to prevent replay attacks.
Authentication Data: Contains the integrity check value (ICV) necessary to authenticate the packet and may contain padding.

Encapsulating Security Payload (ESP)

The Encapsulating Security Payload (ESP) extension header provides origin authenticity, integrity, and confidentiality protection of a packet. ESP also supports encryption-only and authentication-only configurations, but using encryption without authentication is strongly discouraged. Unlike the AH header, the IP packet header is not accounted for. ESP operates directly on top of IP using IP protocol number 50.

An ESP packet diagram:

0 - 7 bit	8 - 15 bit	16 - 23 bit	24 - 31 bit
Security Parameters Index (SPI)
Sequence Number
Payload Data (variable)
	Padding (0-255 bytes)
		Pad Length	Next Header
Authentication Data (variable)

Field meanings:

Security Parameters Index (SPI): Identifies the security parameters in combination with IP address
Sequence Number: A monotonically increasing number, used to prevent replay attacks.
Payload Data: The data to be transferred.
Padding: Used with some block ciphers to pad the data to the full length of a block.
Pad Length: Size of padding in bytes.
Next Header: Identifies the protocol of the transferred data.
Authentication Data: Contains the data used to authenticate the packet.

Thursday, April 26, 2007

Cool Linux Sites

These are Linux sites maintained by people with lots of free time.

Linux Documentation Project
The canonical set of Linux online and printed documentation.
Linux Online
Linux information.
linux.org.uk
Linux information from Great Britain (very good!)
Linux International
An organization for promoting the use of Linux.
The linux-kernel mailing list FAQ
Answers to frequently-asked questions about the Linux kernel (including how to submit patches)
"A small trail through the Linux kernel"
A walk-through of what the kernel does when it runs a small demonstration program.
Linux kernel source finder
A list of where to get architecture-specific kernel sources and patches.

Wednesday, April 25, 2007

Debugging kernel with printk

The most common debugging technique is monitoring, which in applications programming is done by calling printf at suitable points. When you are debugging kernel code, you can accomplish the same goal with printk.

4.2.1. printk

We use the printk function with the simplifying assumption that it works like printf. Now it's time to introduce some of the differences.

One of the differences is that printk lets you classify messages according to their severity by associating different loglevels, or priorities, with the messages. You usually indicate the loglevel with a macro. For example, KERN_INFO, which we saw prepended print statements, is one of the possible loglevels of the message. The loglevel macro expands to a string, which is concatenated to the message text at compile time; that's why there is no comma between the priority and the format string in the following examples. Here are two examples of printk commands, a debug message and a critical message:

printk(KERN_DEBUG "Here I am: %s:%i\n", _ _FILE_ _, _ _LINE_ _);
printk(KERN_CRIT "I'm trashed; giving up on %p\n", ptr);

There are eight possible loglevel strings, defined in the header ; we list them in order of decreasing severity:

KERN_EMERG: Used for emergency messages, usually those that precede a crash.
KERN_ALERT: A situation requiring immediate action.
KERN_CRIT: Critical conditions, often related to serious hardware or software failures.
KERN_ERR: Used to report error conditions; device drivers often use KERN_ERR to report hardware difficulties.
KERN_WARNING: Warnings about problematic situations that do not, in themselves, create serious problems with the system.
KERN_NOTICE: Situations that are normal, but still worthy of note. A number of security-related conditions are reported at this level.
KERN_INFO: Informational messages. Many drivers print information about the hardware they find at startup time at this level.
KERN_DEBUG: Used for debugging messages.

Each string (in the macro expansion) represents an integer in angle brackets. Integers range from 0 to 7, with smaller values representing higher priorities.

A printk statement with no specified priority defaults to DEFAULT_MESSAGE_LOGLEVEL, specified in kernel/printk.c as an integer. In the 2.6.10 kernel, DEFAULT_MESSAGE_LOGLEVEL is KERN_WARNING, but that has been known to change in the past.

Based on the loglevel, the kernel may print the message to the current console, be it a text-mode terminal, a serial port, or a parallel printer. If the priority is less than the integer variable console_loglevel, the message is delivered to the console one line at a time (nothing is sent unless a trailing newline is provided). If both klogd and syslogd are running on the system, kernel messages are appended to /var/log/messages (or otherwise treated depending on your syslogd configuration), independent of console_loglevel. If klogd is not running, the message won't reach user space unless you read /proc/kmsg (which is often most easily done with the dmesg command). When using klogd, you should remember that it doesn't save consecutive identical lines; it only saves the first such line and, at a later time, the number of repetitions it received.

The variable console_loglevel is initialized to DEFAULT_CONSOLE_LOGLEVEL and can be modified through the sys_syslog system call. One way to change it is by specifying the -c switch when invoking klogd, as specified in the klogd manpage. Note that to change the current value, you must first kill klogd and then restart it with the -c option. Alternatively, you can write a program to change the console loglevel. You'll find a version of such a program in misc-progs/setlevel.c in the source files provided on O'Reilly's FTP site. The new level is specified as an integer value between 1 and 8, inclusive. If it is set to 1, only messages of level 0 (KERN_EMERG) reach the console; if it is set to 8, all messages, including debugging ones, are displayed.

It is also possible to read and modify the console loglevel using the text file /proc/sys/kernel/printk. The file hosts four integer values: the current loglevel, the default level for messages that lack an explicit loglevel, the minimum allowed loglevel, and the boot-time default loglevel. Writing a single value to this file changes the current loglevel to that value; thus, for example, you can cause all kernel messages to appear at the console by simply entering:

 # echo 8 > /proc/sys/kernel/printk

It should now be apparent why the hello.c sample had the KERN_ALERT; markers; they are there to make sure that the messages appear on the console.

4.2.2. Redirecting Console Messages

Linux allows for some flexibility in console logging policies by letting you send messages to a specific virtual console (if your console lives on the text screen). By default, the "console" is the current virtual terminal. To select a different virtual terminal to receive messages, you can issue ioctl(TIOCLINUX) on any console device. The following program, setconsole , can be used to choose which console receives kernel messages; it must be run by the superuser and is available in the misc-progs directory.

The following is the program in its entirety. You should invoke it with a single argument specifying the number of the console that is to receive messages.

int main(int argc, char **argv)
{
   char bytes[2] = {11,0}; /* 11 is the TIOCLINUX cmd number */

   if (argc=  =2) bytes[1] = atoi(argv[1]); /* the chosen console */
   else {
       fprintf(stderr, "%s: need a single arg\n",argv[0]); exit(1);
   }
   if (ioctl(STDIN_FILENO, TIOCLINUX, bytes)<0) {    /* use stdin */
       fprintf(stderr,"%s: ioctl(stdin, TIOCLINUX): %s\n",
               argv[0], strerror(errno));
       exit(1);
   }
   exit(0);
}

setconsole uses the special ioctl command TIOCLINUX, which implements Linux-specific functions. To use TIOCLINUX, you pass it an argument that is a pointer to a byte array. The first byte of the array is a number that specifies the requested subcommand, and the following bytes are subcommand specific. In setconsole, subcommand 11 is used, and the next byte (stored in bytes[1]) identifies the virtual console. The complete description of TIOCLINUX can be found in drivers/char/tty_io.c, in the kernel sources.

4.2.3. How Messages Get Logged

The printk function writes messages into a circular buffer that is _ _LOG_BUF_LEN bytes long: a value from 4 KB to 1 MB chosen while configuring the kernel. The function then wakes any process that is waiting for messages, that is, any process that is sleeping in the syslog system call or that is reading /proc/kmsg. These two interfaces to the logging engine are almost equivalent, but note that reading from /proc/kmsg consumes the data from the log buffer, whereas the syslog system call can optionally return log data while leaving it for other processes as well. In general, reading the /proc file is easier and is the default behavior for klogd. The dmesg command can be used to look at the content of the buffer without flushing it; actually, the command returns to stdout the whole content of the buffer, whether or not it has already been read.

If you happen to read the kernel messages by hand, after stopping klogd, you'll find that the /proc file looks like a FIFO, in that the reader blocks, waiting for more data. Obviously, you can't read messages this way if klogd or another process is already reading the same data, because you'll contend for it.

If the circular buffer fills up, printk wraps around and starts adding new data to the beginning of the buffer, overwriting the oldest data. Therefore, the logging process loses the oldest data. This problem is negligible compared with the advantages of using such a circular buffer. For example, a circular buffer allows the system to run even without a logging process, while minimizing memory waste by overwriting old data should nobody read it. Another feature of the Linux approach to messaging is that printk can be invoked from anywhere, even from an interrupt handler, with no limit on how much data can be printed. The only disadvantage is the possibility of losing some data.

If the klogd process is running, it retrieves kernel messages and dispatches them to syslogd, which in turn checks /etc/syslog.conf to find out how to deal with them. syslogd differentiates between messages according to a facility and a priority; allowable values for both the facility and the priority are defined in . Kernel messages are logged by the LOG_KERN facility at a priority corresponding to the one used in printk (for example, LOG_ERR is used for KERN_ERR messages). If klogd isn't running, data remains in the circular buffer until someone reads it or the buffer overflows.

If you want to avoid clobbering your system log with the monitoring messages from your driver, you can either specify the -f (file) option to klogd to instruct it to save messages to a specific file, or customize /etc/syslog.conf to suit your needs. Yet another possibility is to take the brute-force approach: kill klogd and verbosely print messages on an unused virtual terminal,^[1] or issue the command cat /proc/kmsg from an unused xterm.

Tuesday, April 24, 2007

Experiments with Linux Process (Useful)

1. Introduction

Traditionally, a Unix process is divided into segments. The standard segments are code segment, data segment, BSS (block started by symbol), and stack segment.

The code segment contains the binary code of the program which is running as the process (a "process" is a program in execution). The data segment contains the initialized global variables and data structures. The BSS segment contains the uninitialized global data structures and finally, the stack segment contains the local variables, return addresses, etc. for the particular process.

Under Linux, a process can execute in two modes - user mode and kernel mode. A process usually executes in user mode, but can switch to kernel mode by making system calls. When a process makes a system call, the kernel takes control and does the requested service on behalf of the process. The process is said to be running in kernel mode during this time. When a process is running in user mode, it is said to be "in userland" and when it is running in kernel mode it is said to be "in kernel space". We will first have a look at how the process segments are dealt with in userland and then take a look at the bookkeeping on process segments done in kernel space.

2. Userland's view of the segments

The code segment consists of the code - the actual executable program. The code of all the functions we write in the program resides in this segment. The addresses of the functions will give us an idea where the code segment is. If we have a function foo() and let x be the address of foo (x = &foo;). we know that x will point within the code segment.

The Data segment consists of the initialized global variables of a program. The Operating system needs to know what values are used to initialize the global variables. The initialized variables are kept in the data segment. To get the address of the data segment we declare a global variable and then print out its address. This address must be inside the data segment.

The BSS consists of the uninitialized global variables of a process. To get an address which occurs inside the BSS, we declare an uninitialized global variable, then print its address.

The automatic variables (or local variables) will be allocated on the stack, so printing out the addresses of local variables will provide us with the addresses within the stack segment.

3. A C program

Let's have a look at the following C program:

 1 #include 
2 #include 
3 #include 
4 #include 
5
6 int our_init_data = 30;
7 int our_noinit_data;
8
9 void our_prints(void)
10 {
11         int our_local_data = 1;
12         printf("\nPid of the process is = %d", getpid());
13         printf("\nAddresses which fall into:");
14         printf("\n 1) Data  segment = %p",
15                 &our_init_data);
16         printf("\n 2) BSS   segment = %p",
17                 &our_noinit_data);
18         printf("\n 3) Code  segment = %p",
19                 &our_prints);
20         printf("\n 4) Stack segment = %p\n",
21                 &our_local_data);
22
23         while(1);
24 }
25
26 int main()
27 {
28         our_prints();
29         return 0;
30 }

We can see that lines 6 and 7 declare two global variables. One is initialized and one is uninitialized. Per the previous discussion, the initialized variable will fall into the data segment and the uninitialized variable will fall into the BSS segment. Lines 14-17 print the addresses of the variables.

We also know that the address of the function our_prints will fall into the code segment, so that if we print the address of this function, we will get a value which falls into the code segment. This is done in lines 18-19.

Finally we print the address of a local variable. This automatic variable's address will be within the stack segment.

4. Execution of a userland program

When we execute a userland program, similar to the one given above, what happens is that the shell will fork() and exec() the new program. The exec() code inside the kernel will figure out what format the binary is in (ELF, a.out, etc.) and will call the corresponding handler for that format. For example when an ELF format file is loaded, the function load_elf_binary() from fs/binfmt_elf.c takes care of initializing the kernel data structures for the particular process. Details of this portion of loading will not be dealt with here, as that in itself is a topic for another article :-) The point here is that the code which loads the executable into the kernel fills in the kernel data structures.

5. Memory-related data structures in the kernel

In the Linux kernel, every process has an associated struct task_struct. The definition of this struct is in the header file include/linux/sched.h. The following snippet is from the 2.6.10 Linux kernel source code (only the needed fields and a few nearby fields are shown):

struct task_struct {
      volatile long state;    /* -1 unrunnable, 0 runnable, >0 stopped */
      struct thread_info *thread_info;
      atomic_t usage;
      ...
      ...
      ...
      struct mm_struct *mm, *active_mm;
      ...
      ...
      ...
      pid_t pid;
      ...
      ...
      ...
      char comm[16];
      ...
      ...
};

Three members of the data structure are relevant to us:

pid contains the Process ID of the process.
comm holds the name of the process.
The mm_struct within the task_struct is the key to all memory management activities related to the process.

The mm_struct is defined in include/linux/sched.h as:

struct mm_struct {
      struct vm_area_struct * mmap;           /* list of VMAs */
      struct rb_root mm_rb;
      struct vm_area_struct * mmap_cache;     /* last find_vma result */
      ...
      ...
      ...
      unsigned long start_code, end_code, start_data, end_data;
      unsigned long start_brk, brk, start_stack;
      ...
      ...
      ...
};

Here the first member of importance is the mmap. The mmap contains the pointer to the list of VMAs (Virtual Memory Areas) related to this process. Full usage of the process address space occurs very rarely. The sparse regions used are denoted by VMAs. So each VMA will contain information about a single region. The VMAs are stored in struct vm_area_struct defined in linux/mm.h:

struct vm_area_struct {
      struct mm_struct * vm_mm;       /* The address space we belong to. */
      unsigned long vm_start;         /* Our start address within vm_mm. */
      unsigned long vm_end;           /* The first byte after our end address
                                         within vm_mm. */
      ....
      ....
      ....
      /* linked list of VM areas per task, sorted by address */
      struct vm_area_struct *vm_next;
      ....
      ....
}

6. Kernel's view of the segments

The kernel keeps track of the segments which have been allocated to a particular process using the above structures. For each segment, the kernel allocates a VMA. It keeps track of these segments in the mm_struct structures.

The kernel tracks the data segment using two variables: start_data and end_data. The code segment boundaries are in the start_code and end_code variables. The stack segment is covered by the single variable start_stack. There is no special variable to keep track of the BSS segment — the VMA corresponding to the BSS accounts for it.

7. A kernel module

Let's have a look at the code for a kernel module:

 1 #include 
2 #include 
3 #include 
4 #include 
5 #include 
6
7 static int pid_mem = 1;
8
9 static void print_mem(struct task_struct *task)
10 {
11         struct mm_struct *mm;
12         struct vm_area_struct *vma;
13         int count = 0;
14         mm = task->mm;
15         printk("\nThis mm_struct has %d vmas.\n", mm->map_count);
16         for (vma = mm->mmap ; vma ; vma = vma->vm_next) {
17                 printk ("\nVma number %d: \n", ++count);
18                 printk("  Starts at 0x%lx, Ends at 0x%lx\n",
19                           vma->vm_start, vma->vm_end);
20         }
21         printk("\nCode  Segment start = 0x%lx, end = 0x%lx \n"
22                  "Data  Segment start = 0x%lx, end = 0x%lx\n"
23                  "Stack Segment start = 0x%lx\n",
24                  mm->start_code, mm->end_code,
25                  mm->start_data, mm->end_data,
26                  mm->start_stack);
27 }
28
29 static int mm_exp_load(void){
30         struct task_struct *task;
31         printk("\nGot the process id to look up as %d.\n", pid_mem);
32         for_each_process(task) {
33                 if ( task->pid == pid_mem) {
34                         printk("%s[%d]\n", task->comm, task->pid);
35                         print_mem(task);
36                 }
37         }
38         return 0;
39 }
40
41 static void mm_exp_unload(void)
42 {
43         printk("\nPrint segment information module exiting.\n");
44 }
45
46 module_init(mm_exp_load);
47 module_exit(mm_exp_unload);
48 module_param(pid_mem, int, 0);
49
50 MODULE_AUTHOR ("Krishnakumar. R, rkrishnakumar@gmail.com");
51 MODULE_DESCRIPTION ("Print segment information");
52 MODULE_LICENSE("GPL");

The module accepts the pid of the process, which it should dissect, as its parameter (line 48). The module will go through the list of processes in the kernel (32-37), and when it finds the required pid, it will call the function 'print_mem' function which will print the details from the memory management related data structures of the kernel.

8. Let us get into execution mode

I ran the C program given in the earlier section and, while it was still running, loaded the kernel module with the pid of the process. Please note that the program was compiledstatically (-static) rather than dynamically, to avoid the unnecessary complication of shared libraries. Here is what I got:

# ./print_segments &
Pid of the process is = 3283
Addresses which fall into:
1) Data  segment = 0x80a000c
2) BSS   segment = 0x80a1a10
3) Code  segment = 0x80481f4
4) Stack segment = 0xbffff8e4

# /sbin/insmod print_kern_ds.ko pid_mem=3283

Got the process id to look up as 3283.
print_segments[3283]

This mm_struct has 5 vmas.

Vma number 1:
Starts at 0x8048000, Ends at 0x80a0000

Vma number 2:
Starts at 0x80a0000, Ends at 0x80a1000

Vma number 3:
Starts at 0x80a1000, Ends at 0x80c3000

Vma number 4:
Starts at 0xb7fff000, Ends at 0xb8000000

Vma number 5:
Starts at 0xbffff000, Ends at 0xc0000000

Code  Segment start = 0x8048000, end = 0x809fc38
Data  Segment start = 0x80a0000, end = 0x80a0ec4
Stack Segment start = 0xbffffb30

Let's analyze the output. According to the userland program the address 0x80a000c should fall into the data segment. This can be verified by looking into the information we got from the kernel module, on printing the Data segment starting address and VMA number 2. For the code segment, it is starting at 0x8048000 as per the kernel data structures. Also according to the userland program the address 0x80481f4 should fall into the code segment. Hence userland and kernel tallies.

Now, lets look at the Stack segment: the userland program says that the address 0xbffff8e4 should fall into it and kernel data structures states that stack will start from 0xbffffb30. In a 386-based architecture the stack grows downwards. The BSS is not stored in any particular variable of the kernel, but there is a VMA allocated for the corresponding location - from the userland program, the address 0x80a1a10 should come inside the BSS, and a look at VMA 3 makes it clear that this is the corresponding VMA for the BSS.

9. Gathering Information from /proc

We have been using custom programs to explore the contents of the data structures inside the kernel, but the kernel provides a standard interface for us to access such information. The memory maps of a particular process can be obtained by doing a 'cat /proc//maps' where should be the pid of the process of which we need to get the details about. When I ran it, the program used pid 3283; here is the memory map, trimmed to fit:

# cat /proc/3283/maps | cut -f1 -d' '
08048000-080a0000
080a0000-080a1000
080a1000-080c3000
b7fff000-b8000000
bffff000-c0000000

10. Conclusion

We have looked at the userland perspective of how the segments are treated for a program. Then we examined the data structures in the kernel which keep track of the segments. We verified that our assumptions are correct using userland and kernel programs. Finally we used the standard kernel interface to obtain information regarding the memory regions of a specific process.

Tuesday, April 3, 2007

2.4.10 kernel compilation

Compiling a linux 2.4 kernel.

The currently running kernel is 2.4.20-8 and re-compiling it makes 2.4.20-8custom. The steps are:

Step 1: make xconfig

Step 2: make dep (because it was instructed)

Step 3: make bzImage (bzImage is same as vmlinuz, it's renamed to vmlinuz)

Step 4: make modules

Step 5: make modules_install (This creates directory /lib/modules/2.4.20-8custom. This directory contains a 'build' directory that is a symbolic link to /usr/src/linux. The same case is with /lib/modules/2.4.20-8/build.

It is necessary to do make modules_install because probably the currently running kernel looks for the modules in /lib/modules/_version_current_kernel/.

So, if you run the new kernel (2.4.20-8) without running 'make modules_install', the new kernel will not find /lib/modules/2.4.20-8custom, and hence will not be able to find the modules and hence will not boot up properly (it may not boot up at all).

BUT I HAVE HACKED IT BY CHANGING THE "EXTRAVERIONS" VARIABLE IN THE TOP LEVEL MAKEFILE (FROM -8custom to -8) SO THAT THE VERSION OF THE NEWLY COMPILED WOULD STILL BE THE SAME (2.4.20-8), AND I WAS BE ABLE TO BOOT UP THE NEW KERNEL. BUT THERE WAS GLITCH, I COULDN'T BUILD "INITRD" AND HENCE I COMPILED 'EXT3' FILESYSTEM IN THE KERNEL RATHER THAN AS A MODULE. THE KERNEL BOOTED UP FINE AFTER THAT.

The directories /lib/modules/2.4.20-8 and /lib/modules/2.4.20-8custom look almost the same.

Step 6: make install (No need of doing this if you have already made bzImage. It makes vmlinux again (already made in 'make bzImage'). Then ultimately it runs the 'install.sh' script in /usr/src/linux/arc/i386/boot directory.

This script basically copies bzImage (renamed to vmlinuz-XXX), and System.map file from /usr/src/linux directory to the /boot directory. The parameters to this script can be read in the script. One of the parameters is the version of the new kernel.

If '/sbin/installkernel' script is present then it will call that script and 'installkernel' script will do all these tasks. 'installkernel' script will also make the initial ramdisk image (initrd) using the 'mkinitrd' command. This is needed if the 'ext3' filesystem is compiled as a module. If 'ext3' filesystem is built in the kernel, then initrd is not needed. This script also updates the 'grub.conf' file in /boot/grub directory. The mkinitrd would be something like this -

mkinitrd [-f] initrd_image_name current_kernel_version

The initrd image is always named as initrd-$(current_kernel_version).img. Our version is 2.4.20-8custom and hence our initrd image name would be named initrd-2.4.20-8custom.img

Initrd image can be made only if the directory /lib/modules/2.4.20-8custom exists. For any initrd-XXX.img image, the /lib/modules/XXX directory should exist.

For the system to be able to load up the modules successfully, the /lib/modules/XXX directory should be present (The kernel version is also XXX). The modules to be loaded are loaded from this directory. If your sound driver is compiled as a module and if the /lib/modules/XXX directory doesn't exist then the sound driver will not be loaded when you try to play a sound.
You may get an error saying "The driver could not be loaded". I have got this error. The device has been recognized but the driver could not be loaded.

(Alternate) Step 6: In case you don't want to run 'make install' then do the following:

(Alternate) Step 6.1: Move vmlinuz-XXX, initrd-XXX, System.map-XXX, to .old in the /boot directory. Don't overwrite these files. Update grub.conf so that you have an old running kernel (in case your new kernel doesn't boot up).

(Alternate) Step 6.2: cp /usr/src/linux/arch/i386/boot/bzImage /boot/vmlinux-XXX

(Alternate) Step 6.3: cp /usr/src/linux/System.map /boot/System.map-XXX

(Alternate) Step 6.4: Create a symbolic link System.map to System.map-XXX in the /boot directory.

(Alternate) Step 6.5: Make initrd in /boot directory:
mkinitrd initrd-XXX.img XXX

(Alternate) Step 6.6: Update /boot/grub/grub.conf accordingly

(Alternate) Step 6.7: In case initrd is not used, you have to build ext3 in the kernel and the kernel 'root' parameter variable in grub.conf would be the actual device name. (root=/dev/hdaX).