Re: [PATCH] memfd: support MFD_NOEXEC alongside MFD_EXEC

Jeff Xu <jeffxu@xxxxxxxxxxxx> · Wed, 2 Aug 2023 17:06:17 -0700

On Wed, Aug 2, 2023 at 12:58 AM David Rheinsberg <david@xxxxxxxxxxxx> wrote:
> >> I think I didn't get my point across. Imagine an application that does *NOT* use sealing, but uses memfds. This application shares memfds with untrusted clients, and does this in a safe way (SIGBUS protected). Everything works fine, unless someone decides to enable `vm.memfd_noexec=2`. Suddenly, the memfd will have sealing enabled *without* the application ever requesting this. Now any untrusted client that got the memfd can add seals to the memfd, even though the creator of the memfd did not enable sealing. This client can now seal WRITES on the memfd, even though it really should not be able to do that.
> >>
> >> (This is not an hypothetical setup, we have such setups for data sharing already)
> >
> > Thanks, this helps me understand your point better.
> >
> > I'm not convinced that sysctl needs to consider the threat model of
> > "someone" changing and breaking an application.  If we follow that
> > threat model, there are a lot of other sysctls to worry about.
> >
> > Also, in the system that you described, if memfd is handled to an
> > untrusted process, not only "sealing"  can cause damage, but also
> > chmod, arbitrary rw,  imo the right approach is to harden the process
> > or mechanism of passing the memfd.
>
> No. The model I describe is carefully designed to hand out file-descriptors to inodes that the clients have *no* access to. They cannot run fchmod(2), unlink(2), etc. All they can do is operate on the open file. And all access to this shared file is properly guarded against possible damage the other concurrent clients can do. The entire model is already hardened against malicious actors.
>
> With the new sysctl, a new attack-vector is introduced, which was not possible before.
>
> I was *explicitly* told to add `MFD_ALLOW_SEALING` for that exact reason when introducing memfd_create(2). So I am a bit baffled why it is now ok to enable sealing behind the users back.
>
> I agree that the new sysctl is a root-only option. But I fail to see *why* it implies `MFD_ALLOW_SEALING`? This behavior is not documented nor is it explained in the original commit-messages, nor mentioned *anywhere*.
>
> >> Thus, setting the security-option `memfd_noexec` *breaks* applications, because it enables sealing. If `MFD_NOEXEC_SEAL` would *not* imply `MFD_ALLOW_SEALING`, this would not be an issue. IOW, why does ´MFD_NOEXEC_SEAL` clear `F_SEAL_SEAL` even if `MFD_ALLOW_SEALING` is not set?
> >>
> >
> > If MFD_NOEXEC_SEAL is not desired, then it should not be used to
> > overwrite memfd_create() in this system.
> >
> > For the question of why the sysctl adding a seal without application
> > setting it , the rationale here is, as summary of previous/this
> > emails:
>
> I still think we are not talking about the same thing. I completely understand why you add the seal! I am just questioning why you *CLEAR* `F_SEAL_SEAL`? That is, why do you enable `MFD_ALLOW_SEALING` without the user requesting it? You could just set `F_SEAL_EXEC` without clearing `F_SEAL_SEAL`. And then require `MFD_ALLOW_SEALING` on top to clear `F_SEAL_SEAL`.
>
Ah, I apologize. I didn't read it carefully enough and misunderstood
you, thanks for clarification.

The reason that F_SEAL_SEAL is cleared, is that MFD_NOEXEC_SEAL
implies MFD_ALLOW_SEALING, and it seems to be reasonable that
application might want to use sealing  e.g I image application write
the content to memfd then adding F_SEAL_WRITE.

Your point is that MFD_ALLOW_SEALING should not be implied by
MFD_NOEXEC_SEAL. An application should still explicitly set
MFD_ALLOW_SEALING.

To me, MFD_NOEXEC_SEAL, the _SEAL part implies to allow sealing, but
of course, this might not be so clear to anyone other than me :-) ,
documentation is indeed necessary.

And with the context you described,  now I think your approach is better:
1> application set MFD_NOEXEC_SEAL, with MFD_ALLOW_SEALING
    F_SEAL_EXEC is set, F_SEAL_SEAL is clear.
2> Application set MFD_NOEXEC_SEAL, without MFD_ALLOW_SEALING
    F_SEAL_EXEC and F_SEAL_SEAL are set.

> [...]
> >> The downside of `MFD_NOEXEC` is that it might be picked over `MFD_NOEXEC_SEAL` by uneducated users, thus reducing security. But right now, the alternative is that existing code picks `MFD_EXEC` instead and never clears the executable bit, because it is a hassle to do so.
> >>
> >
> > Yes. This is the downside I was thinking about.
> >
> > I lean to believe the kernel API shouldn't be feature rich, it could
> > be simple, optimized towards the majority of user cases, and ideally,
> > is self-explained without devs to look through documentation. For
> > example, if I had to choose one to implement between MFD_NOEXEC and
> > MFD_NOEXEC_SEAL, I would choose MFD_NOEXEC_SEAL because it should be
> > what most users care about.
>
> Well, if we were to go back, we would make MFD_NOEXEC(_SEAL) the default and just add `MFD_EXEC` :)
>
> >> Or is there another reason *not* to include `MFD_NOEXEC`? I am not sure I understand fully why you fight it so vehemently?
> >>
> >
> > I wouldn't add it myself, I hope to convince you not to :-).
> > If you still think it is beneficial to add MFD_NOEXEC (saving one
> > chmod call and making it easy to use), I wouldn't feel bad about that.
> > I would suggest going with documentation to help devs to choose
> > between those two, i.e. recommend MFD_NOEXEC_SEAL in most cases.
>
> Any application that cannot use `F_SEAL_EXEC` (e.g., because its peers verify for historic reasons that the seal is not set) now has to do an extra dance to get the "safer" behavior, rather than getting the "safer" behavior by default. That is, we make it easier to get the unsafe behavior than to get the safe behavior (in this particular scenario).
> Without `MFD_NOEXEC`, it is easier to end up with a 0777 memfd than not. I want the application that desires `S_IXUSR` to jump through hoops, not the application that does *not* require it.
>
I see your points now, i.e.  the "disallow sealing entirely" is at
least as important as "not able to chmod to add X".
I think the reasonable mid-ground is perhaps adding MFD_NOEXEC
support, with some documentation to help dev to choose between
MFD_NOEXEC and MFD_NOEXEC_SEAL

Would you like to update your patch to the last version on Andrew's
branch, adding selftest, and perhaps help for documentation ?

Thanks!
-Jeff