Re: Linux memory layout (was kernel address space)

Bhanu Kalyan Chetlapalli <chbhanukalyan@xxxxxxxxx> · Sun, 4 Sep 2005 06:17:39 +0530

On 9/3/05, Rahul Iyer <idlisambar@xxxxxxxxx> wrote:
> Hi All,
> I followed the kernel address space thread, and unfortunately, by the
> flow of it, found it rather confusing. So at the risk of repitition,
> I'll try to explain this from scratch. Some of it might be very
> elementary, so bear with me. Also, since I know precious little about
> any other architecture, what I write pertains to x86 ONLY.
> 
> ok, so an x86 is a 32 bit processor. Hence, the maximum memory that can
> be addressed is 2^32 = 4GB.

Actually the maximum *physical* memory that can be addressed is indeed
4GB.... But with the special PAE (Physical Address Extensions), it can
grow up to 64 GB. But let us leave  PAE for the moment.

>  This 4GB set of addressable addresses
> (redundant, i know) is called the address space and the addresses are
> called virtual addresses. Now, to access physical memory, every process
> must go through the paging system (provided paging is turned on, which
> it is in Linux).

Actually it *must* go through Segmentation before paging.... But since
Linux (As well as NT) adopt the Basic Flat Model, it amounts to
practically disabling segmentation. x86 ISA does not allow total
disabling of segmenetation.

> Also, in order to be able to access *any* physical
> page, that page *must* be in your page tables. Every process has it's
> own page tables.

the kernel is generally meant to be shared in pages which have the 'G'
bit set in the page table/dir entries.... This enables the TLB flushes
to prevent removing entries associated with the kernel on context
switches (loading the CR3 in fact)

> 
> Now, the kernel has it's own code and data in some *physical* pages, 

Actually, the code of the kernel is mapped in 2 4MB pages on x86
machines with PSE (Page Size Extensions) support to reduce both TLB
misses of kernel code as well as reduce the 4K TLB contention (since
4MB pages have a seperate TLB on x86)

> and
> each user process has it's code, data etc, in some *physical* pages. In
> order to be able to access these physical pages, these physical pages
> must be mapped in the page tables. How is this done?
> 

u mean the kernel functions where this is done? or how it is done as
per x86 arch?

> The first 3G worth of virtual addresses are mapped to the physical pages
> of the process, i.e, the userland pages. The remaining 1G is used to map
> the physical pages containing the kernel's stuff. So, every process has
> in it's page tables, the first 3G of virtual addresses mapped to it's
> own pages and the last 1G mapped to the kernel's pages. This is commonly
> called the 3G/1G split.
> 
> Now, since the kernel needs to be able to access all of physical memory
> and the kernel has only 1G of address space, the kernel can access only
> 1G of physical memory. This places an upper limit on the amount of RAM a
> machine could have. Some people use a 2G/2G split.. that is the first 2G
> is userspace and the next 2G is kernel space.
> 
> A way to access > 2GB is to reserve the last 128MB of the 1G of kernel
> space addresses for temporary mappings.

This 128MB is not just to use memory greater than 2GB (or 1GB) or
so..... When memory is kept on allocating and freed, you end up having
a fragmented 896MB of memory space. Now what happens if I want to
allocate a contiguous memory of 1 or 2 MB to load a kernel module? I
wont get that space right? So I allocate it using *vmalloc* which is
virtual address space for the kernel. (note : my choice of example of
a kernel module for vmalloc..... you cannot use vmalloc if u want to
do I/O to that allocated space as I/O does NOT pass through the MMU of
the processor to get converted.)

> These are done on the fly, as
> and when needed. 1024MB - 128MB = 896MB. That's where the magical 896 MB
> comes from.
> 
> Some kernels, like Redhat's from what I read on this list, have a
> separate page table for the kernel. That is, the kernel has a *separate*
> page table with 4G addresses and the user processes too have page tables
> with all 4GB belonging to the user process. The problem, IMHO, with this
> is that every switch from userspace to kernel space involves a TLB
> flush. This is bad for performance. 

Extremely bad for performance on some architectures..... Some
architectures like SPARC v9 when running Solaris (I never worked on
Linux on SPARC v9) *ALWAYS* use this kind of a mechanism to run. That
is because of the fundamental differences in the Architecture between
SPARC and x86......

If using x86 & have insufficient space, best option is to move to
AMD64 (x86_64).

> Any comments? Rik, Arjan?
> 
> Hope that helped
> Thanks
> Rahul
> 
> 
> --
> Kernelnewbies: Help each other learn about the Linux kernel.
> Archive:       http://mail.nl.linux.org/kernelnewbies/
> FAQ:           http://kernelnewbies.org/faq/
> 
> 

-- 
The difference between Theory and Practice is more so in Practice than
in Theory.

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/