Bahadir Balban wrote:
1) For a kernel that has high memory starting at 896MB, and processes having pages in high memory, when there's data exchange from those user pages to the kernel or vice versa, does the kernel have to first map those addresses (i.e. modify page tables), and then access them, even if the pages are in-memory?
Yes, via kmap() or kmap_atomic(). Your next question will be if that's not horribly slow. See below.
2) Since a 4GB space is addressable in a 32-bit system, why would the kernel maintain a 1GB logical space only? LDD3 page 415 says "the biggest consumer of kernel address space is virtual mappings for physical memory." Does this mean the page table entries that keep this mapping consume a lot of kernel space and that's why logical space is kept low?
No. So as to not have to switch pagetables (and therefore flush the TLB, the on-CPU pagetable cache, which is a very costly operation) upon each entry to and exit from the kernel, kernel and user address space share that same 4GB. Userspace normally gets 3GB of it -- you supposedly bought the machine to run applications and not so much a kernel -- leaving 1GB for the kernel. Subtract 128M of addressspace which is reserved for things like vmalloc() and ioremap() (and high memory mappings...) and you're at that familiar 896M.
Now, there has been quite a period where 896M was really a lot. All my systems upto P2, with 64M in them, get cold chills running down their little spines even _thinking_ about addressing that much. In practice the sharing seemed to only have upsides: things were much faster than they could've been had a TLB flush been necessary upon each transition to/from the kernel and only a tiny, specialized, fraction of machines ran with insane amounts of memory anyway.
Then, of course, consumer x86 lost its marbles as well and the problem was upon all of us but by that time, the "thou shalt not flush the TLB" mantra was strong enough to not base things around so much, but to go with the high-memory system: the lower part (896M) of memory is still permanently mapped as was always the case, and you map memory above that into kernel space when required, and unmap it when no longer needed.
Which works. It's certainly not the cleanest approach if you're the kind who likes neat schematic drawings of algorithms (I am. Oh am I ever) but it works. It's also not very fast, but when a TLB flush is the alternative it doesn't easily get worse. Moreover, by the time the masses really started running x86 with more than 896M (a year ago to, well, now I guess?) x86-64 was "upon us" as well, and 64-bit arches obviously do not share this problem. Or at least not for a _very_ long time still...
There are also the famous "4G/4G" patches: that code does in fact switch pagetables and can thereby give both the user and the kernel its own full 4G addressspace. Reports on the speed-penalty have been mixed. The code might have made it into -mm (Andrew Morton's testing tree) but I'm not certain about that. As to future, certainly after x86-64 (or another 64-bit arch) truly obsoletes x86 I personally believe it might be worth it to allways use 4G/4G (or at least for machines with more than 896M), say "sorry, guys, we don't support more than 4G on x86 anymore", and rip out the highmem code. Would certainly make for an easier maintainable VM, and it will probably need to be maintained for a long time still for embedded use.
Another thing, which is significantly easier to do: adjust the split down somewhat. I've been told it's against some SysV ABI to go beneath 3G for userspace but chances are good you won't care too much. For machines with 1G to have to cope with highmem just to get that last 128M supported is fairly icky. If in include/asm/page.h you adjust the __PAGE_OFFSET define(s) down a bit, that should be all you need. From 3G (0xc0000000) to 0xb8000000 (3G-128M) or 0xb0000000 for good measure. A patch which does this for you also lives in the -ck tree, available at:
http://members.optusnet.com.au/ckolivas/kernel/ Hope this was useful... Rene. -- Kernelnewbies: Help each other learn about the Linux kernel. Archive: http://mail.nl.linux.org/kernelnewbies/ FAQ: http://kernelnewbies.org/faq/