Re: [RFC PATCH v1 1/2] mm/memfd: Add support for F_SEAL_FUTURE_EXEC to memfd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 05, 2024 at 05:09:22PM -0800, Isaac J. Manjarres wrote:
> Android currently uses the ashmem driver [1] for creating shared memory
> regions between processes. Ashmem buffers can initially be mapped with
> PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the
> ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the
> permissions that the buffer can be mapped with.
>
> Processes can remove the ability to map ashmem buffers as executable to
> ensure that those buffers cannot be exploited to run unintended code.
> We are currently trying to replace ashmem with memfd. However, memfd
> does not have a provision to permanently remove the ability to map a
> buffer as executable. Although, this should be something that can be
> achieved via a new file seal.
>
> There are known usecases (e.g. CursorWindow [2]) where a process
> maps a buffer with read/write permissions before restricting the buffer
> to being mapped as read-only for future mappings.
>
> The resulting VMA from the writable mapping has VM_MAYEXEC set, meaning
> that mprotect() can change the mapping to be executable. Therefore,
> implementing the seal similar to F_SEAL_WRITE would not be appropriate,
> since it would not work with the CursorWindow usecase. This is because
> the CursorWindow process restricts the mapping permissions to read-only
> after the writable mapping is created. So, adding a file seal for
> executable mappings that operates like F_SEAL_WRITE would fail.
>
> Therefore, add support for F_SEAL_FUTURE_EXEC, which is handled
> similarly to F_SEAL_FUTURE_WRITE. This ensures that CursorWindow can
> continue to create a writable mapping initially, and then restrict the
> permissions on the buffer to be mappable as read-only by using both
> F_SEAL_FUTURE_WRITE and F_SEAL_FUTURE_EXEC. After the seal is
> applied, any calls to mmap() with PROT_EXEC will fail.
>
> [1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline:common/drivers/staging/android/ashmem.c
> [2] https://developer.android.com/reference/android/database/CursorWindow
>
> Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> Cc: Kalesh Singh <kaleshsingh@xxxxxxxxxx>
> Cc: John Stultz <jstultz@xxxxxxxxxx>
> Signed-off-by: Isaac J. Manjarres <isaacmanjarres@xxxxxxxxxx>
> ---
>  include/linux/mm.h         |  5 +++++
>  include/uapi/linux/fcntl.h |  1 +
>  mm/memfd.c                 |  1 +
>  mm/mmap.c                  | 11 +++++++++++
>  4 files changed, 18 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 4eb8e62d5c67..40c03a491e45 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4096,6 +4096,11 @@ static inline bool is_write_sealed(int seals)
>  	return seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE);
>  }
>
> +static inline bool is_exec_sealed(int seals)
> +{
> +	return seals & F_SEAL_FUTURE_EXEC;
> +}
> +
>  /**
>   * is_readonly_sealed - Checks whether write-sealed but mapped read-only,
>   *                      in which case writes should be disallowing moving
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index 6e6907e63bfc..ef066e524777 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -49,6 +49,7 @@
>  #define F_SEAL_WRITE	0x0008	/* prevent writes */
>  #define F_SEAL_FUTURE_WRITE	0x0010  /* prevent future writes while mapped */
>  #define F_SEAL_EXEC	0x0020  /* prevent chmod modifying exec bits */
> +#define F_SEAL_FUTURE_EXEC	0x0040 /* prevent future executable mappings */
>  /* (1U << 31) is reserved for signed error codes */
>
>  /*
> diff --git a/mm/memfd.c b/mm/memfd.c
> index 35a370d75c9a..77b49995a044 100644
> --- a/mm/memfd.c
> +++ b/mm/memfd.c
> @@ -184,6 +184,7 @@ unsigned int *memfd_file_seals_ptr(struct file *file)
>  }
>
>  #define F_ALL_SEALS (F_SEAL_SEAL | \
> +		     F_SEAL_FUTURE_EXEC |\
>  		     F_SEAL_EXEC | \
>  		     F_SEAL_SHRINK | \
>  		     F_SEAL_GROW | \
> diff --git a/mm/mmap.c b/mm/mmap.c
> index b1b2a24ef82e..c7b96b057fda 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -375,6 +375,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  		if (!file_mmap_ok(file, inode, pgoff, len))
>  			return -EOVERFLOW;
>

Not maybe in favour of _where_ in the logic we check this and definitely
not in expanding this do_mmap() stuff much further.

See comment at bottom though... I have a cunning plan :)

> +		if (is_exec_sealed(seals)) {

Are we intentionally disallowing a MAP_PRIVATE memfd's mapping's execution?
I've not tested this scenario so don't know if we somehow disallow this in
another way but note on write checks we only care about shared mappings.

I mean one could argue that a MAP_PRIVATE situation is the same as copying
the data into an anon buffer and doing what you want with it, here you
could argue the same...

So probably we should only care about VM_SHARED?

> +			/* No new executable mappings if the file is exec sealed. */
> +			if (prot & PROT_EXEC)

Seems strange to reference a prot flag rather than vma flag, we should have
that set up by now.

> +				return -EACCES;
> +			/*
> +			 * Prevent an initially non-executable mapping from
> +			 * later becoming executable via mprotect().
> +			 */
> +			vm_flags &= ~VM_MAYEXEC;
> +		}
> +

You know, I'm in two minds about this... I explicitly moved logic to
do_mmap() in [0] to workaround a chicken-and-egg scenario with having
accidentally undone the ability to mmap() read-only F_WRITE_SEALed
mappings, which meant I 'may as well' move the 'future proofing' clearing
of VM_MAYWRITE for F_SEAL_FUTURE_WRITE too.

But now I feel that the use of shmem_mmap() and hugetlbfs_file_mmap() to do
_any_ of this is pretty odious in general, we may as well do it all
upfront.

[0]:https://lore.kernel.org/all/cover.1732804776.git.lorenzo.stoakes@xxxxxxxxxx/

>  		flags_mask = LEGACY_MAP_MASK;
>  		if (file->f_op->fop_flags & FOP_MMAP_SYNC)
>  			flags_mask |= MAP_SYNC;
> --
> 2.47.0.338.g60cca15819-goog
>

So actually - overall - could you hold off a bit on this until I've had a
think and can perhaps send a patch that refactors this?

Then your patch can build on top of that one and we can handle this all in
one place and sanely :)

Sorry you've kicked off thought processes here and that's often a dangerous
thing :P




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux