On Thu, Jun 29, 2023 at 4:34 PM <jeffxu@xxxxxxxxxxxx> wrote: > > From: Jeff Xu <jeffxu@xxxxxxxxxx> > Please ignore this, I resent V3 with a cover letter. > Add documentation for sysctl vm.memfd_noexec > > Link:https://lore.kernel.org/linux-mm/CABi2SkXUX_QqTQ10Yx9bBUGpN1wByOi_=gZU6WEy5a8MaQY3Jw@xxxxxxxxxxxxxx/T/ > Reported-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx> > Signed-off-by: Jeff Xu <jeffxu@xxxxxxxxxx> > --- > Documentation/admin-guide/sysctl/vm.rst | 30 +++++++++++++++++++++++++ > 1 file changed, 30 insertions(+) > > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst > index 45ba1f4dc004..621588041a9e 100644 > --- a/Documentation/admin-guide/sysctl/vm.rst > +++ b/Documentation/admin-guide/sysctl/vm.rst > @@ -424,6 +424,36 @@ e.g., up to one or two maps per allocation. > > The default value is 65530. > > +memfd_noexec: > +============= > +This pid namespaced sysctl controls memfd_create(). > + > +The new MFD_NOEXEC_SEAL and MFD_EXEC flags of memfd_create() allows > +application to set executable bit at creation time. > + > +When MFD_NOEXEC_SEAL is set, memfd is created without executable bit > +(mode:0666), and sealed with F_SEAL_EXEC, so it can't be chmod to > +be executable (mode: 0777) after creation. > + > +when MFD_EXEC flag is set, memfd is created with executable bit > +(mode:0777), this is the same as the old behavior of memfd_create. > + > +The new pid namespaced sysctl vm.memfd_noexec has 3 values: > +0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like > + MFD_EXEC was set. > +1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like > + MFD_NOEXEC_SEAL was set. > +2: memfd_create() without MFD_NOEXEC_SEAL will be rejected. > + > +The default value is 0. > + > +Once set, it can't be downgraded at runtime, i.e. 2=>1, 1=>0 > +are denied. > + > +This is pid namespaced sysctl, child processes inherit the parent > +process's memfd_noexec at the time of fork. Changes to the parent > +process after fork are not automatically propagated to the child > +process. > > memory_failure_early_kill: > ========================== > -- > 2.41.0.255.g8b1d071c50-goog >