Re: [RFC PATCH RESEND v2 1/2] mm/memfd: Add support for F_SEAL_FUTURE_EXEC to memfd

Jeff Xu <jeffxu@xxxxxxxxxxxx> · Mon, 6 Jan 2025 21:21:25 -0800

On Mon, Jan 6, 2025 at 5:26 PM Isaac Manjarres
<isaacmanjarres@xxxxxxxxxx> wrote:
>
> On Mon, Jan 06, 2025 at 09:35:09AM -0800, Jeff Xu wrote:
> > + Kees because this is related to W^X memfd and security.
> >
> > On Fri, Jan 3, 2025 at 7:04 AM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > >
> > > On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres
> > > <isaacmanjarres@xxxxxxxxxx> wrote:
> > > > Android currently uses the ashmem driver [1] for creating shared memory
> > > > regions between processes. Ashmem buffers can initially be mapped with
> > > > PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the
> > > > ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the
> > > > permissions that the buffer can be mapped with.
> > > >
> > > > Processes can remove the ability to map ashmem buffers as executable to
> > > > ensure that those buffers cannot be exploited to run unintended code.
> > >
> > > Is there really code out there that first maps an ashmem buffer with
> > > PROT_EXEC, then uses the ioctl to remove execute permission for future
> > > mappings? I don't see why anyone would do that.
> > >
> > > > For instance, suppose process A allocates a memfd that is meant to be
> > > > read and written by itself and another process, call it B.
> > > >
> > > > Process A shares the buffer with process B, but process B injects code
> > > > into the buffer, and compromises process A, such that it makes A map
> > > > the buffer with PROT_EXEC. This provides an opportunity for process A
> > > > to run the code that process B injected into the buffer.
> > > >
> > > > If process A had the ability to seal the buffer against future
> > > > executable mappings before sharing the buffer with process B, this
> > > > attack would not be possible.
> > >
> > > I think if you want to enforce such restrictions in a scenario where
> > > the attacker can already make the target process perform
> > > semi-arbitrary syscalls, it would probably be more reliable to enforce
> > > rules on executable mappings with something like SELinux policy and/or
> > > F_SEAL_EXEC.
> > >
> > I would like to second on the suggestion of  making this as part of F_SEAL_EXEC.
>
> Thanks for taking a look at this patch Jeff! Can you please elaborate
> some more on how F_SEAL_EXEC should be extended to restricting executable
> mappings?
>
> I understand that if a memfd file is non-executable (either because it
> was made non-executable via fchmod() or by being created with
> MFD_NOEXEC_SEAL) one could argue that applying F_SEAL_EXEC to that file
> would also mean preventing any executable mappings. However, it is not
> clear to me if we should tie a file's executable permissions to whether
> or not if it can be mapped as executable. For example, shared object
> files don't have to have executable permissions, but processes should
> be able to map them as executable.
>
> The case where we apply F_SEAL_EXEC on an executable memfd also seems
> awkward to me, since memfds can be mapped as executable by default
> so what would happen in that scenario?
>
> I also shared the same concerns in my response to Jann in [1].
>
Apology  for not being clear. I meant this below:
when
1> memfd is created with MFD_NOEXEC_SEAL or
2> memfd is no-exec (NX)  and F_SEAL_EXEC is set.
We could also block the memfd from being mapped as executable.

MFD_NOEXEC_SEAL/F_SEAL_EXEC  is added in 6fd7353829ca, which is about
2 years old, I m not sure any application uses the case of creating a
MFD_NOEXEC_SEAL memfd and still wants to mmap it as executable memory,
that is a strange user case.  It is more logical that  applications
want to block both execve() and mmap() for a non-executable memfd.
Therefore I think we could reuse the F_SEAL_EXEC bit + NX state for
this feature, for simplicity.

> > > > diff --git a/mm/memfd.c b/mm/memfd.c
> > > > index 5f5a23c9051d..cfd62454df5e 100644
> > > > --- a/mm/memfd.c
> > > > +++ b/mm/memfd.c
> > > > @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
> > > >  }
> > > >
> > > >  #define F_ALL_SEALS (F_SEAL_SEAL | \
> > > > +                    F_SEAL_FUTURE_EXEC |\
> > > >                      F_SEAL_EXEC | \
> > > >                      F_SEAL_SHRINK | \
> > > >                      F_SEAL_GROW | \
> > > > @@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr)
> > > >         return 0;
> > > >  }
> > > >
> > > > +static inline bool is_exec_sealed(unsigned int seals)
> > > > +{
> > > > +       return seals & F_SEAL_FUTURE_EXEC;
> > > > +}
> > > > +
> > > > +static int check_exec_seal(unsigned long *vm_flags_ptr)
> > > > +{
> > > > +       unsigned long vm_flags = *vm_flags_ptr;
> > > > +       unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
> > > > +
> > > > +       /* Executability is not a concern for private mappings. */
> > > > +       if (!(mask & VM_SHARED))
> > > > +               return 0;
> > >
> > > Why is it not a concern for private mappings?
> > >
> > > > +       /*
> > > > +        * New PROT_EXEC and MAP_SHARED mmaps are not allowed when exec seal
> > > > +        * is active.
> > > > +        */
> > > > +       if (mask & VM_EXEC)
> > > > +               return -EPERM;
> > > > +
> > > > +       /*
> > > > +        * Prevent mprotect() from making an exec-sealed mapping executable in
> > > > +        * the future.
> > > > +        */
> > > > +       *vm_flags_ptr &= ~VM_MAYEXEC;
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > > >  int memfd_check_seals_mmap(struct file *file, unsigned long *vm_flags_ptr)
> > > >  {
> > > >         int err = 0;
> > > >         unsigned int *seals_ptr = memfd_file_seals_ptr(file);
> > > >         unsigned int seals = seals_ptr ? *seals_ptr : 0;
> > > >
> > > > -       if (is_write_sealed(seals))
> > > > +       if (is_write_sealed(seals)) {
> > > >                 err = check_write_seal(vm_flags_ptr);
> > > > +               if (err)
> > > > +                       return err;
> > > > +       }
> > > > +
> > > > +       if (is_exec_sealed(seals))
> > > > +               err = check_exec_seal(vm_flags_ptr);
> > > >
> > memfd_check_seals_mmap is only for mmap() path, right ?
> >
> > How about the mprotect()  path ? i.e.  An attacker can first create a
> > RW VMA mapping for the memfd and later mprotect the VMA to be
> > executable.
> >
> > Similar to the check_write_seal call , we might want to block mprotect
> > for write seal as well.
> >
>
> So when memfd_check_seals_mmap() is called, if the file is exec_sealed,
> check_exec_seal() will not only just check that VM_EXEC is not set,
> but it will also clear VM_MAYEXEC, which will prevent the mapping from
> being changed to executable via mprotect() later.
>
Thanks for clarification.

The name of check_exec_seal() is misleading , check implies a read
operation, but this function actually does update. Maybe renaming to
check_and_update_exec_seal or something like that ?

Do you know which code checks for VM_MAYEXEC flag in the mprotect code
path ?  it isn't obvious to me, i.e. when I grep the VM_MAYEXEC inside
mm path, it only shows one place in mprotect and that doesn't do the
work.

~/mm/mm$ grep VM_MAYEXEC *
mmap.c: mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
mmap.c: vm_flags &= ~VM_MAYEXEC;
mprotect.c: if (rier && (vma->vm_flags & VM_MAYEXEC))
nommu.c: vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
nommu.c: vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;

Thanks
-Jeff

> [1] https://lore.kernel.org/all/Z3x_8uFn2e0EpDqM@xxxxxxxxxx/
>
> Thanks,
> Isaac