Linux memory layout (was kernel address space)

Rahul Iyer <idlisambar@xxxxxxxxx> · Fri, 02 Sep 2005 17:27:08 -0400

Hi All,
I followed the kernel address space thread, and unfortunately, by the 
flow of it, found it rather confusing. So at the risk of repitition, 
I'll try to explain this from scratch. Some of it might be very 
elementary, so bear with me. Also, since I know precious little about 
any other architecture, what I write pertains to x86 ONLY.

ok, so an x86 is a 32 bit processor. Hence, the maximum memory that can 
be addressed is 2^32 = 4GB. This 4GB set of addressable addresses 
(redundant, i know) is called the address space and the addresses are 
called virtual addresses. Now, to access physical memory, every process 
must go through the paging system (provided paging is turned on, which 
it is in Linux). Also, in order to be able to access *any* physical 
page, that page *must* be in your page tables. Every process has it's 
own page tables.

Now, the kernel has it's own code and data in some *physical* pages, and 
each user process has it's code, data etc, in some *physical* pages. In 
order to be able to access these physical pages, these physical pages 
must be mapped in the page tables. How is this done?

The first 3G worth of virtual addresses are mapped to the physical pages 
of the process, i.e, the userland pages. The remaining 1G is used to map 
the physical pages containing the kernel's stuff. So, every process has 
in it's page tables, the first 3G of virtual addresses mapped to it's 
own pages and the last 1G mapped to the kernel's pages. This is commonly 
called the 3G/1G split.

Now, since the kernel needs to be able to access all of physical memory 
and the kernel has only 1G of address space, the kernel can access only 
1G of physical memory. This places an upper limit on the amount of RAM a 
machine could have. Some people use a 2G/2G split.. that is the first 2G 
is userspace and the next 2G is kernel space.

A way to access > 2GB is to reserve the last 128MB of the 1G of kernel 
space addresses for temporary mappings. These are done on the fly, as 
and when needed. 1024MB - 128MB = 896MB. That's where the magical 896 MB 
comes from.

Some kernels, like Redhat's from what I read on this list, have a 
separate page table for the kernel. That is, the kernel has a *separate* 
page table with 4G addresses and the user processes too have page tables 
with all 4GB belonging to the user process. The problem, IMHO, with this 
is that every switch from userspace to kernel space involves a TLB 
flush. This is bad for performance. Any comments? Rik, Arjan?

Hope that helped
Thanks
Rahul

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/