On Thu, Jan 02, 2025 at 03:32:49PM -0800, Isaac J. Manjarres wrote: > * Resending because I accidentally forgot to include Lorenzo in the > "to" list. > > Android uses the ashmem driver [1] for creating shared memory regions > between processes. The ashmem driver exposes an ioctl command for > processes to restrict the permissions an ashmem buffer can be mapped > with. > > Buffers are created with the ability to be mapped as readable, writable, > and executable. Processes remove the ability to map some ashmem buffers > as executable to ensure that those buffers cannot be used to inject > malicious code for another process to run. Other buffers retain their > ability to be mapped as executable, as these buffers can be used for > just-in-time (JIT) compilation. So there is a need to be able to remove > the ability to map a buffer as executable on a per-buffer basis. > > Android is currently trying to migrate towards replacing its ashmem > driver usage with memfd. Part of the transition involved introducing a > library that serves to abstract away how shared memory regions are > allocated (i.e. ashmem vs memfd). This allows clients to use a single > interface for restricting how a buffer can be mapped without having to > worry about how it is handled for ashmem (through the ioctl > command mentioned earlier) or memfd (through file seals). > > While memfd has support for preventing buffers from being mapped as > writable beyond a certain point in time (thanks to > F_SEAL_FUTURE_WRITE), it does not have a similar interface to prevent > buffers from being mapped as executable beyond a certain point. > However, that could be implemented as a file seal (F_SEAL_FUTURE_EXEC) > which works similarly to F_SEAL_FUTURE_WRITE. > > F_SEAL_FUTURE_WRITE was chosen as a template for how this new seal > should behave, instead of F_SEAL_WRITE, for the following reasons: > > 1. Having the new seal behave like F_SEAL_FUTURE_WRITE matches the > behavior that was present with ashmem. This aids in seamlessly > transitioning clients away from ashmem to memfd. I do appreciate the background and of course this is the motivation for the change (and I appreciate the transparency), however it is important to note that _any_ company's internal tooling is of absolutely no relevance to upstream. All justification has to be applied in general terms and be what's right for the core kernel and all users. Hopefully we can achieve a happy scenario where your aims internally align nicely with what's right for (or at least not harmful to) upstream, but obviously justification must be made in terms of the benefits of this change _generally_. > > 2. Making the new seal behave like F_SEAL_WRITE would mean that no > mappings that could become executable in the future (i.e. via > mprotect()) can exist when the seal is applied. However, there are > known cases (e.g. CursorWindow [2]) where restrictions are applied > on how a buffer can be mapped after a mapping has already been made. > That mapping may have VM_MAYEXEC set, which would not allow the seal > to be applied successfully. All mappings have VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC set unless explicitly cleared. I mean this fits in again with the whole problem with what you might call a 'raw' seal and what I addressed with my various patches around write-sealing - that is, unless you explicitly provide some means of clearing the relevant flag on a non-problematic mapping of a sealed memfd, you will simply _not_ be able to mmap() the fd (but rather have to interact with it some other way). This feels honestly like a design flaw and really as if the future variants should always have been how it functioned but I may be missing some context. Anyway this is a side-note and not really relevant to this series, per-se (in fact it argues _for_ it :>) > > Therefore, the F_SEAL_FUTURE_EXEC seal was designed to have the same > semantics as F_SEAL_FUTURE_WRITE. > > Note: this series depends on Lorenzo's work [3], [4], [5] from Andrew > Morton's mm-unstable branch [6], which reworks memfd's file seal checks, > allowing for newer file seals to be implemented in a cleaner fashion. :>) > > Changes from v1 ==> v2: > > - Changed the return code to be -EPERM instead of -EACCES when > attempting to map an exec sealed file with PROT_EXEC to align > to mmap()'s man page. Thank you Kalesh Singh for spotting this! > > - Rebased on top of Lorenzo's work to cleanup memfd file seal checks in > mmap() ([3], [4], and [5]). Thank you for this Lorenzo! :>)) > > - Changed to deny PROT_EXEC mappings only if the mapping is shared, > instead of for both shared and private mappings, after discussing > this with Lorenzo. Yeah I am happy to continue the discussion on this, I very well might be missing something/just wrong here. Security is not my area of expertise but I do need to understand the details of the attack vector to be convinced here (we very well might need an 'explain like I am 5 years old' deal here). > > Opens: > > - Lorenzo brought up that this patch may negatively impact the usage of > MFD_NOEXEC_SCOPE_NOEXEC_ENFORCED [7]. However, it is not clear to me > why that is the case. At the moment, my intent is for the executable > permissions of the file to be disjoint from the ability to create > executable mappings. Sorry for not getting back to you on this, holidays intervened... If the vm.memfd_noexec sysctl is set to 2, you are simply not permitted to allow executable mappings _at all_. see check_sysctl_memfd_noexec(): if (!(*flags & MFD_NOEXEC_SEAL) && sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED) { pr_err_ratelimited( "%s[%d]: memfd_create() requires MFD_NOEXEC_SEAL with vm.memfd_noexec=%d\n", current->comm, task_pid_nr(current), sysctl); return -EACCES; } MFD_NOEXEC_SEAL naturally conflicts with a future noexec flag. I suppose you could say in this instance, that any system that sets this is not compatible with your flag. But you do need to address this. > > Links: > > [1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline:common/drivers/staging/android/ashmem.c > [2] https://developer.android.com/reference/android/database/CursorWindow > [3] https://lore.kernel.org/all/cover.1732804776.git.lorenzo.stoakes@xxxxxxxxxx/ > [4] https://lkml.kernel.org/r/20241206212846.210835-1-lorenzo.stoakes@xxxxxxxxxx > [5] https://lkml.kernel.org/r/7dee6c5d-480b-4c24-b98e-6fa47dbd8a23@lucifer.local > [6] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/tree/?h=mm-unstable > [7] https://lore.kernel.org/all/3a53b154-1e46-45fb-a559-65afa7a8a788@lucifer.local/ > > Links to previous versions: > > v1: https://lore.kernel.org/all/20241206010930.3871336-1-isaacmanjarres@xxxxxxxxxx/ > > Isaac J. Manjarres (2): > mm/memfd: Add support for F_SEAL_FUTURE_EXEC to memfd > selftests/memfd: Add tests for F_SEAL_FUTURE_EXEC > > include/uapi/linux/fcntl.h | 1 + > mm/memfd.c | 39 ++++++++++- > tools/testing/selftests/memfd/memfd_test.c | 79 ++++++++++++++++++++++ I'd like to see some documentation changes too. You need to update userspace-api/mfd_noexec.rst for a start to reference the issue raised above. But it'd be good to actually document this. It'd be good to send a follow up patch to the man pages too should this series end up landing, but we can come to that later... :) > 3 files changed, 118 insertions(+), 1 deletion(-) > > -- > 2.47.1.613.gc27f4b7a9f-goog > Thanks for chasing up with a v2, appreciate your hard work, and sorry for not getting back sooner, have had something of a backlog... (and it seems, this shall continue, indefinitely ;)