Re: [PATCH -next 1/2] mm/memfd: make F_SEAL_FUTURE_WRITE seal more robust

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 21, 2018 at 6:27 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, 20 Nov 2018 13:13:35 -0800 Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>
> > > > I am Ok with whatever Andrew wants to do, if it is better to squash it with
> > > > the original, then I can do that and send another patch.
> > > >
> > > >
> > >
> > > From experience, Andrew will food in fixups on request :)
> >
> > Andrew, could you squash this patch into the one titled ("mm: Add an
> > F_SEAL_FUTURE_WRITE seal to memfd")?
>
> Sure.
>
> I could of course queue them separately but I rarely do so - I don't
> think that the intermediate development states are useful in the
> infinite-term, and I make them available via additional Link: tags in
> the changelog footers anyway.
>
> I think that the magnitude of these patches is such that John Stultz's
> Reviewed-by is invalidated, so this series is now in the "unreviewed"
> state.
>
> So can we have a re-review please?  For convenience, here's the
> folded-together [1/1] patch, as it will go to Linus.
>
>
> From: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx>
> Subject: mm: Add an F_SEAL_FUTURE_WRITE seal to memfd
>
> Android uses ashmem for sharing memory regions.  We are looking forward to
> migrating all usecases of ashmem to memfd so that we can possibly remove
> the ashmem driver in the future from staging while also benefiting from
> using memfd and contributing to it.  Note staging drivers are also not ABI
> and generally can be removed at anytime.
>
> One of the main usecases Android has is the ability to create a region and
> mmap it as writeable, then add protection against making any "future"
> writes while keeping the existing already mmap'ed writeable-region active.
> This allows us to implement a usecase where receivers of the shared
> memory buffer can get a read-only view, while the sender continues to
> write to the buffer.  See CursorWindow documentation in Android for more
> details:
> https://developer.android.com/reference/android/database/CursorWindow
>
> This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
> which prevents any future mmap and write syscalls from succeeding while
> keeping the existing mmap active.  The following program shows the seal
> working in action:
>
>  #include <stdio.h>
>  #include <errno.h>
>  #include <sys/mman.h>
>  #include <linux/memfd.h>
>  #include <linux/fcntl.h>
>  #include <asm/unistd.h>
>  #include <unistd.h>
>  #define F_SEAL_FUTURE_WRITE 0x0010
>  #define REGION_SIZE (5 * 1024 * 1024)
>
> int memfd_create_region(const char *name, size_t size)
> {
>     int ret;
>     int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
>     if (fd < 0) return fd;
>     ret = ftruncate(fd, size);
>     if (ret < 0) { close(fd); return ret; }
>     return fd;
> }
>
> int main() {
>     int ret, fd;
>     void *addr, *addr2, *addr3, *addr1;
>     ret = memfd_create_region("test_region", REGION_SIZE);
>     printf("ret=%d\n", ret);
>     fd = ret;
>
>     // Create map
>     addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
>     if (addr == MAP_FAILED)
>             printf("map 0 failed\n");
>     else
>             printf("map 0 passed\n");
>
>     if ((ret = write(fd, "test", 4)) != 4)
>             printf("write failed even though no future-write seal "
>                    "(ret=%d errno =%d)\n", ret, errno);
>     else
>             printf("write passed\n");
>
>     addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
>     if (addr1 == MAP_FAILED)
>             perror("map 1 prot-write failed even though no seal\n");
>     else
>             printf("map 1 prot-write passed as expected\n");
>
>     ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE |
>                                  F_SEAL_GROW |
>                                  F_SEAL_SHRINK);
>     if (ret == -1)
>             printf("fcntl failed, errno: %d\n", errno);
>     else
>             printf("future-write seal now active\n");
>
>     if ((ret = write(fd, "test", 4)) != 4)
>             printf("write failed as expected due to future-write seal\n");
>     else
>             printf("write passed (unexpected)\n");
>
>     addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
>     if (addr2 == MAP_FAILED)
>             perror("map 2 prot-write failed as expected due to seal\n");
>     else
>             printf("map 2 passed\n");
>
>     addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0);
>     if (addr3 == MAP_FAILED)
>             perror("map 3 failed\n");
>     else
>             printf("map 3 prot-read passed as expected\n");
> }
>
> The output of running this program is as follows:
> ret=3
> map 0 passed
> write passed
> map 1 prot-write passed as expected
> future-write seal now active
> write failed as expected due to future-write seal
> map 2 prot-write failed as expected due to seal
> : Permission denied
> map 3 prot-read passed as expected
>
> [joel@xxxxxxxxxxxxxxxxx: make F_SEAL_FUTURE_WRITE seal more robust]
>   Link: http://lkml.kernel.org/r/20181120052137.74317-1-joel@xxxxxxxxxxxxxxxxx
> Link: http://lkml.kernel.org/r/20181108041537.39694-1-joel@xxxxxxxxxxxxxxxxx
> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> Cc: John Stultz <john.stultz@xxxxxxxxxx>
> Cc: John Reck <jreck@xxxxxxxxxx>
> Cc: Todd Kjos <tkjos@xxxxxxxxxx>
> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> Cc: Daniel Colascione <dancol@xxxxxxxxxx>
> Cc: J. Bruce Fields <bfields@xxxxxxxxxxxx>
> Cc: Jeff Layton <jlayton@xxxxxxxxxx>
> Cc: Khalid Aziz <khalid.aziz@xxxxxxxxxx>
> Cc: Lei Yang <Lei.Yang@xxxxxxxxxxxxx>
> Cc: Marc-Andr Lureau <marcandre.lureau@xxxxxxxxxx>
> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> Cc: Minchan Kim <minchan@xxxxxxxxxx>
> Cc: Shuah Khan <shuah@xxxxxxxxxx>
> Cc: Valdis Kletnieks <valdis.kletnieks@xxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Jann Horn <jannh@xxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
>
>
> --- a/include/uapi/linux/fcntl.h~mm-add-an-f_seal_future_write-seal-to-memfd
> +++ a/include/uapi/linux/fcntl.h
> @@ -41,6 +41,7 @@
>  #define F_SEAL_SHRINK  0x0002  /* prevent file from shrinking */
>  #define F_SEAL_GROW    0x0004  /* prevent file from growing */
>  #define F_SEAL_WRITE   0x0008  /* prevent writes */
> +#define F_SEAL_FUTURE_WRITE    0x0010  /* prevent future writes while mapped */
>  /* (1U << 31) is reserved for signed error codes */
>
>  /*
> --- a/mm/memfd.c~mm-add-an-f_seal_future_write-seal-to-memfd
> +++ a/mm/memfd.c
> @@ -131,7 +131,8 @@ static unsigned int *memfd_file_seals_pt
>  #define F_ALL_SEALS (F_SEAL_SEAL | \
>                      F_SEAL_SHRINK | \
>                      F_SEAL_GROW | \
> -                    F_SEAL_WRITE)
> +                    F_SEAL_WRITE | \
> +                    F_SEAL_FUTURE_WRITE)
>
>  static int memfd_add_seals(struct file *file, unsigned int seals)
>  {
> --- a/fs/hugetlbfs/inode.c~mm-add-an-f_seal_future_write-seal-to-memfd
> +++ a/fs/hugetlbfs/inode.c
> @@ -530,7 +530,7 @@ static long hugetlbfs_punch_hole(struct
>                 inode_lock(inode);
>
>                 /* protected by i_mutex */
> -               if (info->seals & F_SEAL_WRITE) {
> +               if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) {
>                         inode_unlock(inode);
>                         return -EPERM;
>                 }
> --- a/mm/shmem.c~mm-add-an-f_seal_future_write-seal-to-memfd
> +++ a/mm/shmem.c
> @@ -2119,6 +2119,23 @@ out_nomem:
>
>  static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
>  {
> +       struct shmem_inode_info *info = SHMEM_I(file_inode(file));
> +
> +       /*
> +        * New PROT_READ and MAP_SHARED mmaps are not allowed when "future

PROT_WRITE, perhaps?

> +        * write" seal active.
> +        */
> +       if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE) &&
> +           (info->seals & F_SEAL_FUTURE_WRITE))
> +               return -EPERM;
> +
> +       /*
> +        * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED read-only
> +        * mapping, take care to not allow mprotect to revert protections.
> +        */
> +       if (info->seals & F_SEAL_FUTURE_WRITE)
> +               vma->vm_flags &= ~(VM_MAYWRITE);
> +

This might all be clearer as:

if (info->seals & F_SEAL_FUTURE_WRITE) {
  if (vma->vm_flags ...)
    return -EPERM;
  vma->vm_flags &= ~VM_MAYWRITE;
}

with appropriate comments inserted.



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux