On Fri, 11 Aug 2017 17:28:29 -0400 riel@xxxxxxxxxx wrote: > From: Rik van Riel <riel@xxxxxxxxxx> > > Introduce MADV_WIPEONFORK semantics, which result in a VMA being > empty in the child process after fork. This differs from MADV_DONTFORK > in one important way. > > If a child process accesses memory that was MADV_WIPEONFORK, it > will get zeroes. The address ranges are still valid, they are just empty. > > If a child process accesses memory that was MADV_DONTFORK, it will > get a segmentation fault, since those address ranges are no longer > valid in the child after fork. > > Since MADV_DONTFORK also seems to be used to allow very large > programs to fork in systems with strict memory overcommit restrictions, > changing the semantics of MADV_DONTFORK might break existing programs. > > MADV_WIPEONFORK only works on private, anonymous VMAs. > > The use case is libraries that store or cache information, and > want to know that they need to regenerate it in the child process > after fork. > > Examples of this would be: > - systemd/pulseaudio API checks (fail after fork) > (replacing a getpid check, which is too slow without a PID cache) > - PKCS#11 API reinitialization check (mandated by specification) > - glibc's upcoming PRNG (reseed after fork) > - OpenSSL PRNG (reseed after fork) > > The security benefits of a forking server having a re-inialized > PRNG in every child process are pretty obvious. However, due to > libraries having all kinds of internal state, and programs getting > compiled with many different versions of each library, it is > unreasonable to expect calling programs to re-initialize everything > manually after fork. > > A further complication is the proliferation of clone flags, > programs bypassing glibc's functions to call clone directly, > and programs calling unshare, causing the glibc pthread_atfork > hook to not get called. > > It would be better to have the kernel take care of this automatically. I'll add "The patch also adds MADV_KEEPONFORK, to undo the effects of a prior MADV_WIPEONFORK." here. I guess it isn't worth mentioning that these things can cause VMA merges and splits. > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -80,6 +80,17 @@ static long madvise_behavior(struct vm_area_struct *vma, > } > new_flags &= ~VM_DONTCOPY; > break; > + case MADV_WIPEONFORK: > + /* MADV_WIPEONFORK is only supported on anonymous memory. */ > + if (vma->vm_file || vma->vm_flags & VM_SHARED) { > + error = -EINVAL; > + goto out; > + } > + new_flags |= VM_WIPEONFORK; > + break; > + case MADV_KEEPONFORK: > + new_flags &= ~VM_WIPEONFORK; > + break; > case MADV_DONTDUMP: > new_flags |= VM_DONTDUMP; > break; It seems odd to permit MADV_KEEPONFORK against other-than-anon vmas? -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html