wow. not being the one who asked the question I still take myself the right to praise your answer and thank you ! On 11/26/05, Rene Herman <rene.herman@xxxxxxxxxxxx> wrote: > Bahadir Balban wrote: > > > 1) For a kernel that has high memory starting at 896MB, and processes > > having pages in high memory, when there's data exchange from those > > user pages to the kernel or vice versa, does the kernel have to first > > map those addresses (i.e. modify page tables), and then access them, > > even if the pages are in-memory? > > Yes, via kmap() or kmap_atomic(). Your next question will be if that's > not horribly slow. See below. > > > 2) Since a 4GB space is addressable in a 32-bit system, why would the > > kernel maintain a 1GB logical space only? LDD3 page 415 says "the > > biggest consumer of kernel address space is virtual mappings for > > physical memory." Does this mean the page table entries that keep this > > mapping consume a lot of kernel space and that's why logical space is > > kept low? > > No. So as to not have to switch pagetables (and therefore flush the TLB, > the on-CPU pagetable cache, which is a very costly operation) upon each > entry to and exit from the kernel, kernel and user address space share > that same 4GB. Userspace normally gets 3GB of it -- you supposedly > bought the machine to run applications and not so much a kernel -- > leaving 1GB for the kernel. Subtract 128M of addressspace which is > reserved for things like vmalloc() and ioremap() (and high memory > mappings...) and you're at that familiar 896M. > > Now, there has been quite a period where 896M was really a lot. All my > systems upto P2, with 64M in them, get cold chills running down their > little spines even _thinking_ about addressing that much. In practice > the sharing seemed to only have upsides: things were much faster than > they could've been had a TLB flush been necessary upon each transition > to/from the kernel and only a tiny, specialized, fraction of machines > ran with insane amounts of memory anyway. > > Then, of course, consumer x86 lost its marbles as well and the problem > was upon all of us but by that time, the "thou shalt not flush the TLB" > mantra was strong enough to not base things around so much, but to go > with the high-memory system: the lower part (896M) of memory is still > permanently mapped as was always the case, and you map memory above that > into kernel space when required, and unmap it when no longer needed. > > Which works. It's certainly not the cleanest approach if you're the kind > who likes neat schematic drawings of algorithms (I am. Oh am I ever) but > it works. It's also not very fast, but when a TLB flush is the > alternative it doesn't easily get worse. Moreover, by the time the > masses really started running x86 with more than 896M (a year ago to, > well, now I guess?) x86-64 was "upon us" as well, and 64-bit arches > obviously do not share this problem. Or at least not for a _very_ long > time still... > > There are also the famous "4G/4G" patches: that code does in fact switch > pagetables and can thereby give both the user and the kernel its own > full 4G addressspace. Reports on the speed-penalty have been mixed. The > code might have made it into -mm (Andrew Morton's testing tree) but I'm > not certain about that. As to future, certainly after x86-64 (or another > 64-bit arch) truly obsoletes x86 I personally believe it might be worth > it to allways use 4G/4G (or at least for machines with more than 896M), > say "sorry, guys, we don't support more than 4G on x86 anymore", and rip > out the highmem code. Would certainly make for an easier maintainable > VM, and it will probably need to be maintained for a long time still for > embedded use. > > Another thing, which is significantly easier to do: adjust the split > down somewhat. I've been told it's against some SysV ABI to go beneath > 3G for userspace but chances are good you won't care too much. For > machines with 1G to have to cope with highmem just to get that last 128M > supported is fairly icky. If in include/asm/page.h you adjust the > __PAGE_OFFSET define(s) down a bit, that should be all you need. From 3G > (0xc0000000) to 0xb8000000 (3G-128M) or 0xb0000000 for good measure. A > patch which does this for you also lives in the -ck tree, available at: > > http://members.optusnet.com.au/ckolivas/kernel/ > > Hope this was useful... > > Rene. > > -- > Kernelnewbies: Help each other learn about the Linux kernel. > Archive: http://mail.nl.linux.org/kernelnewbies/ > FAQ: http://kernelnewbies.org/faq/ > > -- Kernelnewbies: Help each other learn about the Linux kernel. Archive: http://mail.nl.linux.org/kernelnewbies/ FAQ: http://kernelnewbies.org/faq/