Hi On Tue, Jul 18, 2023, at 9:03 PM, Jeff Xu wrote: > Hi David > > Thanks email and patch for discussion. > > On Fri, Jul 14, 2023 at 4:48 AM David Rheinsberg <david@xxxxxxxxxxxx> wrote: >> >> Add a new flag for memfd_create() called MFD_NOEXEC, which has the >> opposite effect as MFD_EXEC (i.e., it strips the executable bits from >> the inode file mode). >> > I previously thought about having the symmetric flags, such as > MFD_NOEXEC/MFD_EXEC/MFD_NOEXEC_SEAL/MFD_EXEC_SEAL, but decided against > it. The app shall decide beforehand what the memfd is created for, if > it is no-executable, then it should be sealed, such that it can't be > chmod to enable "X" bit. My point is, an application might decide to *not* seal a property, because it knows it has to change it later on. But it might still want to disable the executable bit initially, so to avoid having executable pages around that can be exploited. >> The default mode for memfd_create() has historically been to use 0777 as >> file modes. The new `MFD_EXEC` flag has now made this explicit, paving >> the way to reduce the default to 0666 and thus dropping the executable >> bits for security reasons. Additionally, the `MFD_NOEXEC_SEAL` flag has >> been added which allows this without changing the default right now. >> >> Unfortunately, `MFD_NOEXEC_SEAL` enforces `MFD_ALLOW_SEALING` and >> `F_SEAL_EXEC`, with no alternatives available. This leads to multiple >> issues: >> >> * Applications that do not want to allow sealing either have to use >> `MFD_EXEC` (which we don't want, unless they actually need the >> executable bits), or they must add `F_SEAL_SEAL` immediately on >> return of memfd_create(2) with `MFD_NOEXEC_SEAL`, since this >> implicitly enables sealing. >> >> Note that we explicitly added `MFD_ALLOW_SEALING` when creating >> memfd_create(2), because enabling seals on existing users of shmem >> without them knowing about it can easily introduce DoS scenarios. > > The application that doesn't want MFD_NOEXEC_SEAL can use MFD_EXEC, > the kernel won't add MFD_ALLOW_SEALING implicitly. MFD_EXEC makes the > kernel behave the same as before, this is also why sysctl > vm.memfd_noexec=0 can work seamlessly. > >> It >> is unclear why `MFD_NOEXEC_SEAL` was designed to enable seals, and >> this is especially dangerous with `MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL` >> set via sysctl, since it will silently enable sealing on every memfd >> created. >> > Without sealing, chmod(2) can modify the mfd to be executable, that is > the consideration that MFD_NOEXEC is not provided as an option. > Indeed, current design is "biased" towards promoting MFD_NOEXEC_SEAL > as the safest approach, and try to avoid the pitfall that dev > accidently uses "MFD_NOEXEC" without realizing it can still be > chmod(). I think I didn't get my point across. Imagine an application that does *NOT* use sealing, but uses memfds. This application shares memfds with untrusted clients, and does this in a safe way (SIGBUS protected). Everything works fine, unless someone decides to enable `vm.memfd_noexec=2`. Suddenly, the memfd will have sealing enabled *without* the application ever requesting this. Now any untrusted client that got the memfd can add seals to the memfd, even though the creator of the memfd did not enable sealing. This client can now seal WRITES on the memfd, even though it really should not be able to do that. (This is not an hypothetical setup, we have such setups for data sharing already) Thus, setting the security-option `memfd_noexec` *breaks* applications, because it enables sealing. If `MFD_NOEXEC_SEAL` would *not* imply `MFD_ALLOW_SEALING`, this would not be an issue. IOW, why does ´MFD_NOEXEC_SEAL` clear `F_SEAL_SEAL` even if `MFD_ALLOW_SEALING` is not set? >> * Applications that do not want `MFD_EXEC`, but rely on >> `F_GET_SEALS` to *NOT* return `F_SEAL_EXEC` have no way of achieving >> this other than using `MFD_EXEC` and clearing the executable bits >> manually via fchmod(2). Using `MFD_NOEXEC_SEAL` will set >> `F_SEAL_EXEC` and thus break existing code that hard-codes the >> seal-set. >> >> This is already an issue when sending log-memfds to systemd-journald >> today, which verifies the seal-set of the received FD and fails if >> unknown seals are set. Hence, you have to use `MFD_EXEC` when >> creating memfds for this purpose, even though you really do not need >> the executable bits set. >> >> * Applications that want to enable the executable bit later on, >> currently have no way to create the memfd without it. They have to >> clear the bits immediately after creating it via fchmod(2), or just >> leave them set. >> > Is it OK to do what you want in two steps ? What is the concern there ? i.e. > memfd_create(MFD_EXEC), then chmod to remove the "X" bit. > > I imagine this is probably already what the application does for > creating no-executable mfd before my patch, i.e.: > memfd_create(), then chmod() to remove "X" to remove "X" bit. Yes, correct, this is not a technical issue, but rather an API issue. I don't think most memfd-users are aware that their inode has the executable bit set, and they likely don't want it. But for backwards-compatibility reasons (as noted above), they cannot use `MFD_NOEXEC_SEAL`. Hence, we have to make them explicitly request an executable memfd via `MFD_EXEC` now, even though they clearly do not want this. And then we have to add code to drop the executable immediately afterwards. It don't understand why we don't add out `MFD_NOEXEC` and thus make it a lot easier to patch existing applications? And we make it explicit that these applications don't care for the executable-bit, rather than forcing them to request the executable bit just to drop it immediately. The downside of `MFD_NOEXEC` is that it might be picked over `MFD_NOEXEC_SEAL` by uneducated users, thus reduce security. But right now, the alternative is that existing code picks `MFD_EXEC` instead and never clears the executable bit, because it is a hassle to do so. Or is there another reason *not* to include `MFD_NOEXEC`? I am not sure I understand fully why you fight it so vehemently? Thanks David