On Tue, Jan 3, 2017 at 5:18 AM, Arnd Bergmann <arnd@xxxxxxxx> wrote: > On Monday, January 2, 2017 10:08:28 PM CET Andy Lutomirski wrote: >> >> > This seems to nicely address the same problem on arm64, which has >> > run into the same issue due to the various page table formats >> > that can currently be chosen at compile time. >> >> On further reflection, I think this has very little to do with paging >> formats except insofar as paging formats make us notice the problem. >> The issue is that user code wants to be able to assume an upper limit >> on an address, and it gets an upper limit right now that depends on >> architecture due to paging formats. But someone really might want to >> write a *portable* 64-bit program that allocates memory with the high >> 16 bits clear. So let's add such a mechanism directly. >> >> As a thought experiment, what if x86_64 simply never allocated "high" >> (above 2^47-1) addresses unless a new mmap-with-explicit-limit syscall >> were used? Old glibc would continue working. Old VMs would work. >> New programs that want to use ginormous mappings would have to use the >> new syscall. This would be totally stateless and would have no issues >> with CRIU. > > I can see this working well for the 47-bit addressing default, but > what about applications that actually rely on 39-bit addressing > (I'd have to double-check, but I think this was the limit that > people were most interested in for arm64)? > > 39 bits seems a little small to make that the default for everyone > who doesn't pass the extra flag. Having to pass another flag to > limit the addresses introduces other problems (e.g. mmap from > library call that doesn't pass that flag). That's a fair point. Maybe my straw man isn't so good. > >> If necessary, we could also have a prctl that changes a >> "personality-like" limit that is in effect when the old mmap was used. >> I say "personality-like" because it would reset under exactly the same >> conditions that personality resets itself. > > For "personality-like", it would still have to interact > with the existing PER_LINUX32 and PER_LINUX32_3GB flags that > do the exact same thing, so actually using personality might > be better. > > We still have a few bits in the personality arguments, and > we could combine them with the existing ADDR_LIMIT_3GB > and ADDR_LIMIT_32BIT flags that are mutually exclusive by > definition, such as > > ADDR_LIMIT_32BIT = 0x0800000, /* existing */ > ADDR_LIMIT_3GB = 0x8000000, /* existing */ > ADDR_LIMIT_39BIT = 0x0010000, /* next free bit */ > ADDR_LIMIT_42BIT = 0x8010000, > ADDR_LIMIT_47BIT = 0x0810000, > ADDR_LIMIT_48BIT = 0x8810000, > > This would probably take only one or two personality bits for the > limits that are interesting in practice. Hmm. What if we approached this a bit differently? We could add a single new personality bit ADDR_LIMIT_EXPLICIT. Setting this bit cause PER_LINUX32_3GB etc to be automatically cleared. When ADDR_LIMIT_EXPLICIT is in effect, prctl can set a 64-bit numeric limit. If ADDR_LIMIT_EXPLICIT is cleared, the prctl value stops being settable and reading it via prctl returns whatever is implied by the other personality bits. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html