On Feb 17, 2017 3:02 PM, "Andy Lutomirski" <luto@xxxxxxxxxxxxxx> wrote:
What I'm trying to say is: if we're going to do the route of 48-bit
limit unless a specific mmap call requests otherwise, can we at least
have an interface that doesn't suck?
No, I'm not suggesting specific mmap calls at all. I'm suggesting the complete opposite: not having some magical "max address" at all in the VM layer. Keep all the existing TASK_SIZE defines as-is, and just make those be the new 56-bit limit.
But to then not make most processes use it, just make the default x86 arch_get_free_area() return an address limited to the old 47-bit limit. So effectively all legacy programs work exactly the same way they always did.
Then there are escape mechanisms: the process control that expands that x86 arch_get_free_area() to give high addresses. That would be the normal thing.
But also, exactly *because* we don't make all those TASK_SIZE changes, you could - if you wanted to - use MAP_FIXED to just allocate directly in high virtual space. For example, maybe you just make your own private memory allocator do that, and all the normal stuff would just continue to use the low virtual addresses, and you wouldn't even bother with the prctl().
Because let's face it, the number of processes that will want the high virtual addresses are going to be fairly few and specialised. Maybe even those will want it only for special things (like mapping a huge area of nonvolatile memory)
So I'm saying:
- don't do all these magical TASK_SIZE things at all
- don't need with generic mm code at all.
- only change arch_get_free_area() to take one single process control issue into account.
Keep it simple and stupid, and don't make this address side expansion something that the core mm code needs to even know about.
Linus