On Wed, May 22, 2024 at 4:23 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Wed, 15 May 2024 23:11:12 -0700 Jeff Xu <jeffxu@xxxxxxxxxx> wrote: > > > On Mon, May 13, 2024 at 12:15 PM Barnabás Pőcze <pobrn@xxxxxxxxxxxxxx> wrote: > > > > > > `MFD_NOEXEC_SEAL` should remove the executable bits and set > > > `F_SEAL_EXEC` to prevent further modifications to the executable > > > bits as per the comment in the uapi header file: > > > > > > not executable and sealed to prevent changing to executable > > > > > > However, currently, it also unsets `F_SEAL_SEAL`, essentially > > > acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies > > > that it should be so, and indeed up until the second version > > > of the of the patchset[0] that introduced `MFD_EXEC` and > > > `MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it > > > was changed in the third revision of the patchset[1] without > > > a clear explanation. > > > > > > This behaviour is suprising for application developers, > > > there is no documentation that would reveal that `MFD_NOEXEC_SEAL` > > > has the additional effect of `MFD_ALLOW_SEALING`. > > > > > Ya, I agree that there should be documentation, such as a man page. I will > > work on that. > > > > > So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested. > > > This is technically an ABI break, but it seems very unlikely that an > > > application would depend on this behaviour (unless by accident). > > > > > > [0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@xxxxxxxxxx/ > > > [1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@xxxxxxxxxx/ > > > > ... > > > > Reviewed-by: Jeff Xu <jeffxu@xxxxxxxxxx> > > It's a change to a userspace API, yes? Please let's have a detailed > description of why this is OK. Why it won't affect any existing users. > Unfortunately, this is a breaking change that might break a application if they do below: memfd_create("", MFD_NOEXEC_SEAL) fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE); <-- this will fail in new semantics, due to mfd not being sealable. However, I still think the new semantics is a better, the reason is due to the sysctl: memfd_noexec_scope Currently, when the sysctl is set to MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL kernel adds MFD_NOEXEC_SEAL to memfd_create, and the memfd becomes sealable. E.g. When the sysctl is set to MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL The app calls memfd_create("",0) application will get sealable memfd, which might be a surprise to application. If the app doesn't want this behavior, they will need one of two below in current implementation. 1> set the sysctl: memfd_noexec_scope to 0. So the kernel doesn't overwrite the mdmfd_create 2> modify their code to get non-sealable NOEXEC memfd. memfd_create("", MEMFD_NOEXEC_SCOPE_NOEXEC) fcntl(fd, F_ADD_SEALS, F_SEAL_SEAL) The new semantics works better with the sysctl. Since memfd noexec is new, maybe there is no application using the MFD_NOEXEC_SEAL to create sealable memfd. They mostly likely use memfd(MFD_NOEXEC_SEAL|MFD_ALLOW_SEALING) instead. I think it might benefit in the long term with the new semantics. If breaking change is not recommended, the alternative is to introduce a new flag. MFD_NOEXEC_SEAL_SEAL. (I can't find a better name...) What do you think ? > Also, please let's give consideration to a -stable backport so that all > kernel versions will eventually behave in the same manner. > Yes. If the new semantics is acceptable, backport is needed as bugfix to all kernel versions. I can do that if someone helps me with the process. And sorry about this bug that I created.