Re: [RFC PATCH v1 1/2] mm/memfd: Add support for F_SEAL_FUTURE_EXEC to memfd

Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> · Fri, 6 Dec 2024 21:14:58 +0000

On Fri, Dec 06, 2024 at 12:48:09PM -0800, Isaac Manjarres wrote:
> On Fri, Dec 06, 2024 at 06:19:49PM +0000, Lorenzo Stoakes wrote:
> > On Thu, Dec 05, 2024 at 05:09:22PM -0800, Isaac J. Manjarres wrote:
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index b1b2a24ef82e..c7b96b057fda 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -375,6 +375,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > >  		if (!file_mmap_ok(file, inode, pgoff, len))
> > >  			return -EOVERFLOW;
> > >
> >
> > Not maybe in favour of _where_ in the logic we check this and definitely
> > not in expanding this do_mmap() stuff much further.
> >
> > See comment at bottom though... I have a cunning plan :)
> >
> > > +		if (is_exec_sealed(seals)) {
> >
> > Are we intentionally disallowing a MAP_PRIVATE memfd's mapping's execution?
> > I've not tested this scenario so don't know if we somehow disallow this in
> > another way but note on write checks we only care about shared mappings.
> >
> > I mean one could argue that a MAP_PRIVATE situation is the same as copying
> > the data into an anon buffer and doing what you want with it, here you
> > could argue the same...
> >
> > So probably we should only care about VM_SHARED?
>
> Thanks for taking a look at this!
>
> I'd originally implemented it for just the VM_SHARED case, but after
> discussing it with Kalesh, I changed it to disallow executable
> mappings for both MAP_SHARED and MAP_PRIVATE.
>
> Our thought was that write sealing didn't apply in the MAP_PRIVATE case
> to support COW with MAP_PRIVATE. There's nothing similar to COW with
> execution, so I decided to prevent it for both cases; it also retains
> the same behavior as the ashmem driver.

Hm, yeah I'm not sure that's really justified, I mean what's to stop a
caller from just mapping their own memory mapping executable, copying the
data and executing?

There's also further concerns around execution restriction for instance in
memfd_add_seals():

	/*
	 * SEAL_EXEC implys SEAL_WRITE, making W^X from the start.
	 */
	if (seals & F_SEAL_EXEC && inode->i_mode & 0111)
		seals |= F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_FUTURE_WRITE;

So you probably want to change this to include F_SEAL_FUTURE_EXEC, and note
your proposal interacts negatively with the whole
MFD_NOEXEC_SCOPE_NOEXEC_ENFORCED mode set in vm.memfd_noeec - any system
with this set to '2' will literally not allow you to do what you want if
set to 2.

See https://origin.kernel.org/doc/html/latest/userspace-api/mfd_noexec.html

>
> > > +			/* No new executable mappings if the file is exec sealed. */
> > > +			if (prot & PROT_EXEC)
> >
> > Seems strange to reference a prot flag rather than vma flag, we should have
> > that set up by now.
>
> That makes sense. I can change this to check for VM_EXEC in v2 of this
> series.
> > > +				return -EACCES;
> > > +			/*
> > > +			 * Prevent an initially non-executable mapping from
> > > +			 * later becoming executable via mprotect().
> > > +			 */
> > > +			vm_flags &= ~VM_MAYEXEC;
> > > +		}
> > > +
> >
> > You know, I'm in two minds about this... I explicitly moved logic to
> > do_mmap() in [0] to workaround a chicken-and-egg scenario with having
> > accidentally undone the ability to mmap() read-only F_WRITE_SEALed
> > mappings, which meant I 'may as well' move the 'future proofing' clearing
> > of VM_MAYWRITE for F_SEAL_FUTURE_WRITE too.
> >
> > But now I feel that the use of shmem_mmap() and hugetlbfs_file_mmap() to do
> > _any_ of this is pretty odious in general, we may as well do it all
> > upfront.
> >
> > [0]:https://lore.kernel.org/all/cover.1732804776.git.lorenzo.stoakes@xxxxxxxxxx/
>
> I agree. I really like the idea of handling the future proofing and
> error checking in one place. It makes understanding how these seals
> work a lot easier, and has the added benefit of only worrying about
> the check once rather than having to duplicate it in both shmem_mmap() and
> hugetlbfs_file_mmap().
>
> > >  		flags_mask = LEGACY_MAP_MASK;
> > >  		if (file->f_op->fop_flags & FOP_MMAP_SYNC)
> > >  			flags_mask |= MAP_SYNC;
> > > --
> > > 2.47.0.338.g60cca15819-goog
> > >
> >
> > So actually - overall - could you hold off a bit on this until I've had a
> > think and can perhaps send a patch that refactors this?
> >
> > Then your patch can build on top of that one and we can handle this all in
> > one place and sanely :)
> >
> > Sorry you've kicked off thought processes here and that's often a dangerous
> > thing :P
> Thanks again for reviewing these patches! Happy that I was able to get
> the gears turning :)
>
> I'm really interested in helping with this, so is there any forum you'd
> like to use for collaborating on this or any way I can help?
>
> I'm also more than happy to test out any patches that you'd like!

Thanks, I'm just going to post to the mailing list, this is the discussion
forum I'm making use of for this :)

I will cc- you on my patch and definitely I'd appreciate testing thanks!

But yeah, to be clear I'm not done with reviewing this, I need more time to
digest what you're trying to do here, but you definitely need to think
about the exec limitations.

>
> Thanks,
> Isaac