2024. május 23., csütörtök 22:44 keltezéssel, Jeff Xu <jeffxu@xxxxxxxxxx> írta: > Hi Barnabás > > Is that OK that I work on V2 ? It will be based on your V1 change and > I will also add more test cases. Sure, please go ahead. At the very end of this letter you'll find the commit message that I would have sent in v2, maybe you can salvage some of it. Regards, Barnabás Pőcze > > Thanks > -Jeff > > - > > On Thu, May 23, 2024 at 12:45 PM Andrew Morton > <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > On Wed, 22 May 2024 19:32:35 -0700 Jeff Xu <jeffxu@xxxxxxxxxx> wrote: > > > > > > > > > > It's a change to a userspace API, yes? Please let's have a detailed > > > > description of why this is OK. Why it won't affect any existing users. > > > > > > > Unfortunately, this is a breaking change that might break a > > > application if they do below: > > > memfd_create("", MFD_NOEXEC_SEAL) > > > fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE); <-- this will fail in new > > > semantics, due to mfd not being sealable. > > > > > > However, I still think the new semantics is a better, the reason is > > > due to the sysctl: memfd_noexec_scope > > > Currently, when the sysctl is set to MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL > > > kernel adds MFD_NOEXEC_SEAL to memfd_create, and the memfd becomes sealable. > > > E.g. > > > When the sysctl is set to MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL > > > The app calls memfd_create("",0) > > > application will get sealable memfd, which might be a surprise to application. > > > > > > If the app doesn't want this behavior, they will need one of two below > > > in current implementation. > > > 1> > > > set the sysctl: memfd_noexec_scope to 0. > > > So the kernel doesn't overwrite the mdmfd_create > > > > > > 2> > > > modify their code to get non-sealable NOEXEC memfd. > > > memfd_create("", MEMFD_NOEXEC_SCOPE_NOEXEC) > > > fcntl(fd, F_ADD_SEALS, F_SEAL_SEAL) > > > > > > The new semantics works better with the sysctl. > > > > > > Since memfd noexec is new, maybe there is no application using the > > > MFD_NOEXEC_SEAL to create > > > sealable memfd. They mostly likely use > > > memfd(MFD_NOEXEC_SEAL|MFD_ALLOW_SEALING) instead. > > > I think it might benefit in the long term with the new semantics. > > > > Yes, it's new so I expect any damage will be small. Please prepare a > > v2 which fully explains/justifies the thinking for this > > non-backward-compatible change and which include the cc:stable. > > > > > --- memfd: `MFD_NOEXEC_SEAL` should not imply `MFD_ALLOW_SEALING` `MFD_NOEXEC_SEAL` should remove the executable bits and set `F_SEAL_EXEC` to prevent further modifications to the executable bits as per the comment in the uapi header file: not executable and sealed to prevent changing to executable However, currently, it also unsets `F_SEAL_SEAL`, essentially acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies that it should be so, and indeed up until the second version of the of the patchset[0] that introduced `MFD_EXEC` and `MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it was changed in the third revision of the patchset[1] without a clear explanation. This behaviour is surprising for application developers, there is no documentation that would reveal that `MFD_NOEXEC_SEAL` has the additional effect of `MFD_ALLOW_SEALING`. Additionally, combined with `vm.memfd_noexec=2` it has the effect of making all memfds initially sealable. So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested, thereby returning to the pre-Linux 6.3 behaviour of only allowing sealing when `MFD_ALLOW_SEALING` is specified. Now, this is technically a uAPI break. However, the damage is expected to be minimal. To trigger user visible change, a program has to do the following steps: - create memfd: - with `MFD_NOEXEC_SEAL`, - without `MFD_ALLOW_SEALING`; - try to add seals / check the seals. But that seems unlikely to happen intentionally since this change essentially reverts the kernel's behaviour to that of Linux <6.3, so if a program worked correctly on those older kernels, it will likely work correctly after this change. I have used Debian Code Search and GitHub to try to find potential breakages, and I could only find a single one. dbus-broker's memfd_create() wrapper is aware of this implicit `MFD_ALLOW_SEALING` behaviour, and tries to work around it[2]. This workaround will break. Luckily, this only affects the test suite, it does not affect the normal operations of dbus-broker. There is a PR with a fix[3]. There was also a previous attempt to address this peculiarity by introducing a new flag[4]. [0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@xxxxxxxxxx/ [1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@xxxxxxxxxx/ [2]: https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a8784cb/src/util/misc.c#L114 [3]: https://github.com/bus1/dbus-broker/pull/366 [4]: https://lore.kernel.org/lkml/20230714114753.170814-1-david@xxxxxxxxxxxx/ Cc: stable@xxxxxxxxxxxxxxx Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Barnabás Pőcze <pobrn@xxxxxxxxxxxxxx> Reviewed-by: Jeff Xu <jeffxu@xxxxxxxxxx> Reviewed-by: David Rheinsberg <david@xxxxxxxxxxxx> ---