Re: logical/virtual addresses and high-memory

George Zhim <georgezhim@xxxxxxxxx> · Sat, 26 Nov 2005 15:45:24 +0200

wow. not being the one who asked the question I
still take myself the right to praise your answer and thank you !

On 11/26/05, Rene Herman <rene.herman@xxxxxxxxxxxx> wrote:
> Bahadir Balban wrote:
>
> > 1) For a kernel that has high memory starting at 896MB, and processes
> > having pages in high memory, when there's data exchange from those
> > user pages to the kernel or vice versa, does the kernel have to first
> > map those addresses (i.e. modify page tables), and then access them,
> > even if the pages are in-memory?
>
> Yes, via kmap() or kmap_atomic(). Your next question will be if that's
> not horribly slow. See below.
>
> > 2) Since a 4GB space is addressable in a 32-bit system, why would the
> > kernel maintain a 1GB logical space only? LDD3 page 415 says "the
> > biggest consumer of kernel address space is virtual mappings for
> > physical memory." Does this mean the page table entries that keep this
> > mapping consume a lot of kernel space and that's why logical space is
> > kept low?
>
> No. So as to not have to switch pagetables (and therefore flush the TLB,
> the on-CPU pagetable cache, which is a very costly operation) upon each
> entry to and exit from the kernel, kernel and user address space share
> that same 4GB. Userspace normally gets 3GB of it -- you supposedly
> bought the machine to run applications and not so much a kernel --
> leaving 1GB for the kernel. Subtract 128M of addressspace which is
> reserved for things like vmalloc() and ioremap() (and high memory
> mappings...) and you're at that familiar 896M.
>
> Now, there has been quite a period where 896M was really a lot. All my
> systems upto P2, with 64M in them, get cold chills running down their
> little spines even _thinking_ about addressing that much. In practice
> the sharing seemed to only have upsides: things were much faster than
> they could've been had a TLB flush been necessary upon each transition
> to/from the kernel and only a tiny, specialized, fraction of machines
> ran with insane amounts of memory anyway.
>
> Then, of course, consumer x86 lost its marbles as well and the problem
> was upon all of us but by that time, the "thou shalt not flush the TLB"
> mantra was strong enough to not base things around so much, but to go
> with the high-memory system: the lower part (896M) of memory is still
> permanently mapped as was always the case, and you map memory above that
> into kernel space when required, and unmap it when no longer needed.
>
> Which works. It's certainly not the cleanest approach if you're the kind
> who likes neat schematic drawings of algorithms (I am. Oh am I ever) but
> it works. It's also not very fast, but when a TLB flush is the
> alternative it doesn't easily get worse. Moreover, by the time the
> masses really started running x86 with more than 896M (a year ago to,
> well, now I guess?) x86-64 was "upon us" as well, and 64-bit arches
> obviously do not share this problem. Or at least not for a _very_ long
> time still...
>
> There are also the famous "4G/4G" patches: that code does in fact switch
> pagetables and can thereby give both the user and the kernel its own
> full 4G addressspace. Reports on the speed-penalty have been mixed. The
> code might have made it into -mm (Andrew Morton's testing tree) but I'm
> not certain about that. As to future, certainly after x86-64 (or another
> 64-bit arch) truly obsoletes x86 I personally believe it might be worth
> it to allways use 4G/4G (or at least for machines with more than 896M),
> say "sorry, guys, we don't support more than 4G on x86 anymore", and rip
> out the highmem code. Would certainly make for an easier maintainable
> VM, and it will probably need to be maintained for a long time still for
> embedded use.
>
> Another thing, which is significantly easier to do: adjust the split
> down somewhat. I've been told it's against some SysV ABI to go beneath
> 3G for userspace but chances are good you won't care too much. For
> machines with 1G to have to cope with highmem just to get that last 128M
> supported is fairly icky. If in include/asm/page.h you adjust the
> __PAGE_OFFSET define(s) down a bit, that should be all you need. From 3G
> (0xc0000000) to 0xb8000000 (3G-128M) or 0xb0000000 for good measure. A
> patch which does this for you also lives in the -ck tree, available at:
>
> http://members.optusnet.com.au/ckolivas/kernel/
>
> Hope this was useful...
>
> Rene.
>
> --
> Kernelnewbies: Help each other learn about the Linux kernel.
> Archive:       http://mail.nl.linux.org/kernelnewbies/
> FAQ:           http://kernelnewbies.org/faq/
>
>

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/