On 9/3/05, Rahul Iyer <idlisambar@xxxxxxxxx> wrote: > Hi All, > I followed the kernel address space thread, and unfortunately, by the > flow of it, found it rather confusing. So at the risk of repitition, > I'll try to explain this from scratch. Some of it might be very > elementary, so bear with me. Also, since I know precious little about > any other architecture, what I write pertains to x86 ONLY. > > ok, so an x86 is a 32 bit processor. Hence, the maximum memory that can > be addressed is 2^32 = 4GB. Actually the maximum *physical* memory that can be addressed is indeed 4GB.... But with the special PAE (Physical Address Extensions), it can grow up to 64 GB. But let us leave PAE for the moment. > This 4GB set of addressable addresses > (redundant, i know) is called the address space and the addresses are > called virtual addresses. Now, to access physical memory, every process > must go through the paging system (provided paging is turned on, which > it is in Linux). Actually it *must* go through Segmentation before paging.... But since Linux (As well as NT) adopt the Basic Flat Model, it amounts to practically disabling segmentation. x86 ISA does not allow total disabling of segmenetation. > Also, in order to be able to access *any* physical > page, that page *must* be in your page tables. Every process has it's > own page tables. the kernel is generally meant to be shared in pages which have the 'G' bit set in the page table/dir entries.... This enables the TLB flushes to prevent removing entries associated with the kernel on context switches (loading the CR3 in fact) > > Now, the kernel has it's own code and data in some *physical* pages, Actually, the code of the kernel is mapped in 2 4MB pages on x86 machines with PSE (Page Size Extensions) support to reduce both TLB misses of kernel code as well as reduce the 4K TLB contention (since 4MB pages have a seperate TLB on x86) > and > each user process has it's code, data etc, in some *physical* pages. In > order to be able to access these physical pages, these physical pages > must be mapped in the page tables. How is this done? > u mean the kernel functions where this is done? or how it is done as per x86 arch? > The first 3G worth of virtual addresses are mapped to the physical pages > of the process, i.e, the userland pages. The remaining 1G is used to map > the physical pages containing the kernel's stuff. So, every process has > in it's page tables, the first 3G of virtual addresses mapped to it's > own pages and the last 1G mapped to the kernel's pages. This is commonly > called the 3G/1G split. > > Now, since the kernel needs to be able to access all of physical memory > and the kernel has only 1G of address space, the kernel can access only > 1G of physical memory. This places an upper limit on the amount of RAM a > machine could have. Some people use a 2G/2G split.. that is the first 2G > is userspace and the next 2G is kernel space. > > A way to access > 2GB is to reserve the last 128MB of the 1G of kernel > space addresses for temporary mappings. This 128MB is not just to use memory greater than 2GB (or 1GB) or so..... When memory is kept on allocating and freed, you end up having a fragmented 896MB of memory space. Now what happens if I want to allocate a contiguous memory of 1 or 2 MB to load a kernel module? I wont get that space right? So I allocate it using *vmalloc* which is virtual address space for the kernel. (note : my choice of example of a kernel module for vmalloc..... you cannot use vmalloc if u want to do I/O to that allocated space as I/O does NOT pass through the MMU of the processor to get converted.) > These are done on the fly, as > and when needed. 1024MB - 128MB = 896MB. That's where the magical 896 MB > comes from. > > Some kernels, like Redhat's from what I read on this list, have a > separate page table for the kernel. That is, the kernel has a *separate* > page table with 4G addresses and the user processes too have page tables > with all 4GB belonging to the user process. The problem, IMHO, with this > is that every switch from userspace to kernel space involves a TLB > flush. This is bad for performance. Extremely bad for performance on some architectures..... Some architectures like SPARC v9 when running Solaris (I never worked on Linux on SPARC v9) *ALWAYS* use this kind of a mechanism to run. That is because of the fundamental differences in the Architecture between SPARC and x86...... If using x86 & have insufficient space, best option is to move to AMD64 (x86_64). > Any comments? Rik, Arjan? > > Hope that helped > Thanks > Rahul > > > -- > Kernelnewbies: Help each other learn about the Linux kernel. > Archive: http://mail.nl.linux.org/kernelnewbies/ > FAQ: http://kernelnewbies.org/faq/ > > -- The difference between Theory and Practice is more so in Practice than in Theory. -- Kernelnewbies: Help each other learn about the Linux kernel. Archive: http://mail.nl.linux.org/kernelnewbies/ FAQ: http://kernelnewbies.org/faq/