Re: logical/virtual addresses and high-memory

Rene Herman <rene.herman@xxxxxxxxxxxx> · Sat, 26 Nov 2005 14:25:22 +0100

Bahadir Balban wrote:

1) For a kernel that has high memory starting at 896MB, and processes
having pages in high memory, when there's data exchange from those
user pages to the kernel or vice versa, does the kernel have to first
map those addresses (i.e. modify page tables), and then access them,
even if the pages are in-memory?

Yes, via kmap() or kmap_atomic(). Your next question will be if that's 
not horribly slow. See below.

2) Since a 4GB space is addressable in a 32-bit system, why would the
kernel maintain a 1GB logical space only? LDD3 page 415 says "the
biggest consumer of kernel address space is virtual mappings for
physical memory." Does this mean the page table entries that keep this
mapping consume a lot of kernel space and that's why logical space is
kept low?

No. So as to not have to switch pagetables (and therefore flush the TLB, 
the on-CPU pagetable cache, which is a very costly operation) upon each 
entry to and exit from the kernel, kernel and user address space share 
that same 4GB. Userspace normally gets 3GB of it -- you supposedly 
bought the machine to run applications and not so much a kernel -- 
leaving 1GB for the kernel. Subtract 128M of addressspace which is 
reserved for things like vmalloc() and ioremap() (and high memory 
mappings...) and you're at that familiar 896M.

Now, there has been quite a period where 896M was really a lot. All my 
systems upto P2, with 64M in them, get cold chills running down their 
little spines even _thinking_ about addressing that much. In practice 
the sharing seemed to only have upsides: things were much faster than 
they could've been had a TLB flush been necessary upon each transition 
to/from the kernel and only a tiny, specialized, fraction of machines 
ran with insane amounts of memory anyway.

Then, of course, consumer x86 lost its marbles as well and the problem 
was upon all of us but by that time, the "thou shalt not flush the TLB" 
mantra was strong enough to not base things around so much, but to go 
with the high-memory system: the lower part (896M) of memory is still 
permanently mapped as was always the case, and you map memory above that 
into kernel space when required, and unmap it when no longer needed.

Which works. It's certainly not the cleanest approach if you're the kind 
who likes neat schematic drawings of algorithms (I am. Oh am I ever) but 
it works. It's also not very fast, but when a TLB flush is the 
alternative it doesn't easily get worse. Moreover, by the time the 
masses really started running x86 with more than 896M (a year ago to, 
well, now I guess?) x86-64 was "upon us" as well, and 64-bit arches 
obviously do not share this problem. Or at least not for a _very_ long 
time still...

There are also the famous "4G/4G" patches: that code does in fact switch 
pagetables and can thereby give both the user and the kernel its own 
full 4G addressspace. Reports on the speed-penalty have been mixed. The 
code might have made it into -mm (Andrew Morton's testing tree) but I'm 
not certain about that. As to future, certainly after x86-64 (or another 
64-bit arch) truly obsoletes x86 I personally believe it might be worth 
it to allways use 4G/4G (or at least for machines with more than 896M), 
say "sorry, guys, we don't support more than 4G on x86 anymore", and rip 
out the highmem code. Would certainly make for an easier maintainable 
VM, and it will probably need to be maintained for a long time still for 
embedded use.

Another thing, which is significantly easier to do: adjust the split 
down somewhat. I've been told it's against some SysV ABI to go beneath 
3G for userspace but chances are good you won't care too much. For 
machines with 1G to have to cope with highmem just to get that last 128M 
supported is fairly icky. If in include/asm/page.h you adjust the 
__PAGE_OFFSET define(s) down a bit, that should be all you need. From 3G 
(0xc0000000) to 0xb8000000 (3G-128M) or 0xb0000000 for good measure. A 
patch which does this for you also lives in the -ck tree, available at:

http://members.optusnet.com.au/ckolivas/kernel/

Hope this was useful...

Rene.

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/