On Mon 07-08-17 15:22:57, Michal Hocko wrote: > This is an user visible API so make sure you CC linux-api (added) > > On Sun 06-08-17 10:04:23, Rik van Riel wrote: > > v2: fix MAP_SHARED case and kbuild warnings > > > > Introduce MADV_WIPEONFORK semantics, which result in a VMA being > > empty in the child process after fork. This differs from MADV_DONTFORK > > in one important way. > > > > If a child process accesses memory that was MADV_WIPEONFORK, it > > will get zeroes. The address ranges are still valid, they are just empty. > > > > If a child process accesses memory that was MADV_DONTFORK, it will > > get a segmentation fault, since those address ranges are no longer > > valid in the child after fork. > > > > Since MADV_DONTFORK also seems to be used to allow very large > > programs to fork in systems with strict memory overcommit restrictions, > > changing the semantics of MADV_DONTFORK might break existing programs. > > > > The use case is libraries that store or cache information, and > > want to know that they need to regenerate it in the child process > > after fork. How do they know that they need to regenerate if they do not get SEGV? Are they going to assume that a read of zeros is a "must init again"? Isn't that too fragile? Or do they play other tricks like parse /proc/self/smaps and read in the flag? > > Examples of this would be: > > - systemd/pulseaudio API checks (fail after fork) > > (replacing a getpid check, which is too slow without a PID cache) > > - PKCS#11 API reinitialization check (mandated by specification) > > - glibc's upcoming PRNG (reseed after fork) > > - OpenSSL PRNG (reseed after fork) > > > > The security benefits of a forking server having a re-inialized > > PRNG in every child process are pretty obvious. However, due to > > libraries having all kinds of internal state, and programs getting > > compiled with many different versions of each library, it is > > unreasonable to expect calling programs to re-initialize everything > > manually after fork. > > > > A further complication is the proliferation of clone flags, > > programs bypassing glibc's functions to call clone directly, > > and programs calling unshare, causing the glibc pthread_atfork > > hook to not get called. > > > > It would be better to have the kernel take care of this automatically. > > > > This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO: > > > > https://man.openbsd.org/minherit.2 I would argue that a MAP_$FOO flag would be more appropriate. Or do you see any cases where such a special mapping would need to change the semantic and inherit the content over the fork again? I do not like the madvise because it is an advise and as such it can be ignored/not implemented and that shouldn't have any correctness effects on the child process. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html