On 8.10.2020 20.13, Jann Horn wrote:
On Thu, Oct 8, 2020 at 6:54 PM Topi Miettinen <toiwoton@xxxxxxxxx> wrote:
Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
enables full randomization of memory mappings created with mmap(NULL,
...). With 2, the base of the VMA used for such mappings is random,
but the mappings are created in predictable places within the VMA and
in sequential order. With 3, new VMAs are created to fully randomize
the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
even if not necessary.
[...]
+ if ((flags & MREMAP_MAYMOVE) && randomize_va_space >= 3) {
+ /*
+ * Caller is happy with a different address, so let's
+ * move even if not necessary!
+ */
+ new_addr = arch_mmap_rnd();
+
+ ret = mremap_to(addr, old_len, new_addr, new_len,
+ &locked, flags, &uf, &uf_unmap_early,
+ &uf_unmap);
+ goto out;
+ }
You just pick a random number as the address, and try to place the
mapping there? Won't this fail if e.g. the old address range overlaps
with the new one, causing mremap_to() to bail out at "if (addr +
old_len > new_addr && new_addr + new_len > addr)"?
Thanks for the review. I think overlap would be OK in this case and the
check should be skipped.
Also, on Linux, the main program stack is (currently) an expanding
memory mapping that starts out being something like a couple hundred
kilobytes in size. If you allocate memory too close to the main
program stack, and someone then recurses deep enough to need more
memory, the program will crash. It sounds like your patch will
randomly make such programs crash.
Right, especially on 32 bit systems this could be a real problem. I have
limited the stack for tasks in the whole system to 2MB without problems
(most use only 128kB) and on 48 bit virtual address systems the
collision to 2MB area could be roughly 1/2^(48-21) which is a very small
number. But perhaps this should be still be avoided by not picking an
address too close to bottom of stack, say 64MB to be sure. It may also
make this more useful also for 32 bit systems but overall I'm not so
optimistic due to increased fragmentation.
Also, what's your strategy in general with regards to collisions with
existing mappings? Is your intention to just fall back to the classic
algorithm in that case?
Maybe a different address could be tried (but not infinitely, say 5
times) and then fall back to classics. This would not be good for the
ASLR but I haven't seen mremap() to be used much in my tests.
You may want to consider whether it would be better to store
information about free memory per subtree in the VMA tree, together
with the maximum gap size that is already stored in each node, and
then walk down the tree randomly, with the randomness weighted by free
memory in the subtrees, but ignoring subtrees whose gaps are too
small. And for expanding stacks, it might be a good idea for other
reasons as well (locking consistency) to refactor them such that the
size in the VMA tree corresponds to the maximum expansion of the stack
(and if an allocation is about to fail, shrink such stack mappings).
This would reduce the randomization which I want to avoid. I think the
extra overhead should be OK: if this is unacceptable for a workload or
system constraints, don't use mode '3' but '2'.
Instead of single global sysctl, this could be implemented as a new
personality (or make this model the default and add a compatibility
personality with no or less randomization), so it could be applied only
for some tasks but not all.
-Topi