Re: [PATCH v4 2/6] shm: add sealing API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 20 Jul 2014, David Herrmann wrote:

> If two processes share a common memory region, they usually want some
> guarantees to allow safe access. This often includes:
>   - one side cannot overwrite data while the other reads it
>   - one side cannot shrink the buffer while the other accesses it
>   - one side cannot grow the buffer beyond previously set boundaries
> 
> If there is a trust-relationship between both parties, there is no need
> for policy enforcement. However, if there's no trust relationship (eg.,
> for general-purpose IPC) sharing memory-regions is highly fragile and
> often not possible without local copies. Look at the following two
> use-cases:
>   1) A graphics client wants to share its rendering-buffer with a
>      graphics-server. The memory-region is allocated by the client for
>      read/write access and a second FD is passed to the server. While
>      scanning out from the memory region, the server has no guarantee that
>      the client doesn't shrink the buffer at any time, requiring rather
>      cumbersome SIGBUS handling.
>   2) A process wants to perform an RPC on another process. To avoid huge
>      bandwidth consumption, zero-copy is preferred. After a message is
>      assembled in-memory and a FD is passed to the remote side, both sides
>      want to be sure that neither modifies this shared copy, anymore. The
>      source may have put sensible data into the message without a separate
>      copy and the target may want to parse the message inline, to avoid a
>      local copy.
> 
> While SIGBUS handling, POSIX mandatory locking and MAP_DENYWRITE provide
> ways to achieve most of this, the first one is unproportionally ugly to
> use in libraries and the latter two are broken/racy or even disabled due
> to denial of service attacks.
> 
> This patch introduces the concept of SEALING. If you seal a file, a
> specific set of operations is blocked on that file forever.
> Unlike locks, seals can only be set, never removed. Hence, once you
> verified a specific set of seals is set, you're guaranteed that no-one can
> perform the blocked operations on this file, anymore.
> 
> An initial set of SEALS is introduced by this patch:
>   - SHRINK: If SEAL_SHRINK is set, the file in question cannot be reduced
>             in size. This affects ftruncate() and open(O_TRUNC).
>   - GROW: If SEAL_GROW is set, the file in question cannot be increased
>           in size. This affects ftruncate(), fallocate() and write().
>   - WRITE: If SEAL_WRITE is set, no write operations (besides resizing)
>            are possible. This affects fallocate(PUNCH_HOLE), mmap() and
>            write().
>   - SEAL: If SEAL_SEAL is set, no further seals can be added to a file.
>           This basically prevents the F_ADD_SEAL operation on a file and
>           can be set to prevent others from adding further seals that you
>           don't want.
> 
> The described use-cases can easily use these seals to provide safe use
> without any trust-relationship:
>   1) The graphics server can verify that a passed file-descriptor has
>      SEAL_SHRINK set. This allows safe scanout, while the client is
>      allowed to increase buffer size for window-resizing on-the-fly.
>      Concurrent writes are explicitly allowed.
>   2) For general-purpose IPC, both processes can verify that SEAL_SHRINK,
>      SEAL_GROW and SEAL_WRITE are set. This guarantees that neither
>      process can modify the data while the other side parses it.
>      Furthermore, it guarantees that even with writable FDs passed to the
>      peer, it cannot increase the size to hit memory-limits of the source
>      process (in case the file-storage is accounted to the source).
> 
> The new API is an extension to fcntl(), adding two new commands:
>   F_GET_SEALS: Return a bitset describing the seals on the file. This
>                can be called on any FD if the underlying file supports
>                sealing.
>   F_ADD_SEALS: Change the seals of a given file. This requires WRITE
>                access to the file and F_SEAL_SEAL may not already be set.
>                Furthermore, the underlying file must support sealing and
>                there may not be any existing shared mapping of that file.
>                Otherwise, EBADF/EPERM is returned.
>                The given seals are _added_ to the existing set of seals
>                on the file. You cannot remove seals again.
> 
> The fcntl() handler is currently specific to shmem and disabled on all
> files. A file needs to explicitly support sealing for this interface to
> work. A separate syscall is added in a follow-up, which creates files that
> support sealing. There is no intention to support this on other
> file-systems. Semantics are unclear for non-volatile files and we lack any
> use-case right now. Therefore, the implementation is specific to shmem.
> 
> Signed-off-by: David Herrmann <dh.herrmann@xxxxxxxxx>

Acked-by: Hugh Dickins <hughd@xxxxxxxxxx>

We've just changed the context lines of your hunk to shmem_fallocate(),
sorry, but the reject is easily fixed up in the obvious way.

> ---
>  fs/fcntl.c                 |   5 ++
>  include/linux/shmem_fs.h   |  17 ++++++
>  include/uapi/linux/fcntl.h |  15 +++++
>  mm/shmem.c                 | 143 +++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 180 insertions(+)
> 
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index 72c82f6..22d1c3d 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -21,6 +21,7 @@
>  #include <linux/rcupdate.h>
>  #include <linux/pid_namespace.h>
>  #include <linux/user_namespace.h>
> +#include <linux/shmem_fs.h>
>  
>  #include <asm/poll.h>
>  #include <asm/siginfo.h>
> @@ -336,6 +337,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
>  	case F_GETPIPE_SZ:
>  		err = pipe_fcntl(filp, cmd, arg);
>  		break;
> +	case F_ADD_SEALS:
> +	case F_GET_SEALS:
> +		err = shmem_fcntl(filp, cmd, arg);
> +		break;
>  	default:
>  		break;
>  	}
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 4d1771c..50777b5 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -1,6 +1,7 @@
>  #ifndef __SHMEM_FS_H
>  #define __SHMEM_FS_H
>  
> +#include <linux/file.h>
>  #include <linux/swap.h>
>  #include <linux/mempolicy.h>
>  #include <linux/pagemap.h>
> @@ -11,6 +12,7 @@
>  
>  struct shmem_inode_info {
>  	spinlock_t		lock;
> +	unsigned int		seals;		/* shmem seals */
>  	unsigned long		flags;
>  	unsigned long		alloced;	/* data pages alloced to file */
>  	union {
> @@ -65,4 +67,19 @@ static inline struct page *shmem_read_mapping_page(
>  					mapping_gfp_mask(mapping));
>  }
>  
> +#ifdef CONFIG_TMPFS
> +
> +extern int shmem_add_seals(struct file *file, unsigned int seals);
> +extern int shmem_get_seals(struct file *file);
> +extern long shmem_fcntl(struct file *file, unsigned int cmd, unsigned long arg);
> +
> +#else
> +
> +static inline long shmem_fcntl(struct file *f, unsigned int c, unsigned long a)
> +{
> +	return -EINVAL;
> +}
> +
> +#endif
> +
>  #endif
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index 074b886..beed138 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -28,6 +28,21 @@
>  #define F_GETPIPE_SZ	(F_LINUX_SPECIFIC_BASE + 8)
>  
>  /*
> + * Set/Get seals
> + */
> +#define F_ADD_SEALS	(F_LINUX_SPECIFIC_BASE + 9)
> +#define F_GET_SEALS	(F_LINUX_SPECIFIC_BASE + 10)
> +
> +/*
> + * Types of seals
> + */
> +#define F_SEAL_SEAL	0x0001	/* prevent further seals from being set */
> +#define F_SEAL_SHRINK	0x0002	/* prevent file from shrinking */
> +#define F_SEAL_GROW	0x0004	/* prevent file from growing */
> +#define F_SEAL_WRITE	0x0008	/* prevent writes */
> +/* (1U << 31) is reserved for signed error codes */
> +
> +/*
>   * Types of directory notifications that may be requested.
>   */
>  #define DN_ACCESS	0x00000001	/* File accessed */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 1140f49..51dccd0 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -66,6 +66,7 @@ static struct vfsmount *shm_mnt;
>  #include <linux/highmem.h>
>  #include <linux/seq_file.h>
>  #include <linux/magic.h>
> +#include <linux/fcntl.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm/pgtable.h>
> @@ -532,6 +533,7 @@ EXPORT_SYMBOL_GPL(shmem_truncate_range);
>  static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
>  {
>  	struct inode *inode = dentry->d_inode;
> +	struct shmem_inode_info *info = SHMEM_I(inode);
>  	int error;
>  
>  	error = inode_change_ok(inode, attr);
> @@ -542,6 +544,11 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
>  		loff_t oldsize = inode->i_size;
>  		loff_t newsize = attr->ia_size;
>  
> +		/* protected by i_mutex */
> +		if ((newsize < oldsize && (info->seals & F_SEAL_SHRINK)) ||
> +		    (newsize > oldsize && (info->seals & F_SEAL_GROW)))
> +			return -EPERM;
> +
>  		if (newsize != oldsize) {
>  			i_size_write(inode, newsize);
>  			inode->i_ctime = inode->i_mtime = CURRENT_TIME;
> @@ -1364,6 +1371,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
>  		info = SHMEM_I(inode);
>  		memset(info, 0, (char *)inode - (char *)info);
>  		spin_lock_init(&info->lock);
> +		info->seals = F_SEAL_SEAL;
>  		info->flags = flags & VM_NORESERVE;
>  		INIT_LIST_HEAD(&info->swaplist);
>  		simple_xattrs_init(&info->xattrs);
> @@ -1422,7 +1430,17 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
>  			struct page **pagep, void **fsdata)
>  {
>  	struct inode *inode = mapping->host;
> +	struct shmem_inode_info *info = SHMEM_I(inode);
>  	pgoff_t index = pos >> PAGE_CACHE_SHIFT;
> +
> +	/* i_mutex is held by caller */
> +	if (unlikely(info->seals)) {
> +		if (info->seals & F_SEAL_WRITE)
> +			return -EPERM;
> +		if ((info->seals & F_SEAL_GROW) && pos + len > inode->i_size)
> +			return -EPERM;
> +	}
> +
>  	return shmem_getpage(inode, index, pagep, SGP_WRITE, NULL);
>  }
>  
> @@ -1760,11 +1778,125 @@ static loff_t shmem_file_llseek(struct file *file, loff_t offset, int whence)
>  	return offset;
>  }
>  
> +static int shmem_wait_for_pins(struct address_space *mapping)
> +{
> +	return 0;
> +}
> +
> +#define F_ALL_SEALS (F_SEAL_SEAL | \
> +		     F_SEAL_SHRINK | \
> +		     F_SEAL_GROW | \
> +		     F_SEAL_WRITE)
> +
> +int shmem_add_seals(struct file *file, unsigned int seals)
> +{
> +	struct inode *inode = file_inode(file);
> +	struct shmem_inode_info *info = SHMEM_I(inode);
> +	int error;
> +
> +	/*
> +	 * SEALING
> +	 * Sealing allows multiple parties to share a shmem-file but restrict
> +	 * access to a specific subset of file operations. Seals can only be
> +	 * added, but never removed. This way, mutually untrusted parties can
> +	 * share common memory regions with a well-defined policy. A malicious
> +	 * peer can thus never perform unwanted operations on a shared object.
> +	 *
> +	 * Seals are only supported on special shmem-files and always affect
> +	 * the whole underlying inode. Once a seal is set, it may prevent some
> +	 * kinds of access to the file. Currently, the following seals are
> +	 * defined:
> +	 *   SEAL_SEAL: Prevent further seals from being set on this file
> +	 *   SEAL_SHRINK: Prevent the file from shrinking
> +	 *   SEAL_GROW: Prevent the file from growing
> +	 *   SEAL_WRITE: Prevent write access to the file
> +	 *
> +	 * As we don't require any trust relationship between two parties, we
> +	 * must prevent seals from being removed. Therefore, sealing a file
> +	 * only adds a given set of seals to the file, it never touches
> +	 * existing seals. Furthermore, the "setting seals"-operation can be
> +	 * sealed itself, which basically prevents any further seal from being
> +	 * added.
> +	 *
> +	 * Semantics of sealing are only defined on volatile files. Only
> +	 * anonymous shmem files support sealing. More importantly, seals are
> +	 * never written to disk. Therefore, there's no plan to support it on
> +	 * other file types.
> +	 */
> +
> +	if (file->f_op != &shmem_file_operations)
> +		return -EINVAL;
> +	if (!(file->f_mode & FMODE_WRITE))
> +		return -EPERM;
> +	if (seals & ~(unsigned int)F_ALL_SEALS)
> +		return -EINVAL;
> +
> +	mutex_lock(&inode->i_mutex);
> +
> +	if (info->seals & F_SEAL_SEAL) {
> +		error = -EPERM;
> +		goto unlock;
> +	}
> +
> +	if ((seals & F_SEAL_WRITE) && !(info->seals & F_SEAL_WRITE)) {
> +		error = mapping_deny_writable(file->f_mapping);
> +		if (error)
> +			goto unlock;
> +
> +		error = shmem_wait_for_pins(file->f_mapping);
> +		if (error) {
> +			mapping_allow_writable(file->f_mapping);
> +			goto unlock;
> +		}
> +	}
> +
> +	info->seals |= seals;
> +	error = 0;
> +
> +unlock:
> +	mutex_unlock(&inode->i_mutex);
> +	return error;
> +}
> +EXPORT_SYMBOL_GPL(shmem_add_seals);
> +
> +int shmem_get_seals(struct file *file)
> +{
> +	if (file->f_op != &shmem_file_operations)
> +		return -EINVAL;
> +
> +	return SHMEM_I(file_inode(file))->seals;
> +}
> +EXPORT_SYMBOL_GPL(shmem_get_seals);
> +
> +long shmem_fcntl(struct file *file, unsigned int cmd, unsigned long arg)
> +{
> +	long error;
> +
> +	switch (cmd) {
> +	case F_ADD_SEALS:
> +		/* disallow upper 32bit */
> +		if (arg > UINT_MAX)
> +			return -EINVAL;
> +
> +		error = shmem_add_seals(file, arg);
> +		break;
> +	case F_GET_SEALS:
> +		error = shmem_get_seals(file);
> +		break;
> +	default:
> +		error = -EINVAL;
> +		break;
> +	}
> +
> +	return error;
> +}
> +
>  static long shmem_fallocate(struct file *file, int mode, loff_t offset,
>  							 loff_t len)
>  {
>  	struct inode *inode = file_inode(file);
>  	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> +	struct shmem_inode_info *info = SHMEM_I(inode);
>  	struct shmem_falloc shmem_falloc;
>  	pgoff_t start, index, end;
>  	int error;
> @@ -1781,6 +1913,12 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
>  		loff_t unmap_start = round_up(offset, PAGE_SIZE);
>  		loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1;
>  
> +		/* protected by i_mutex */
> +		if (info->seals & F_SEAL_WRITE) {
> +			error = -EPERM;
> +			goto out;
> +		}
> +
>  		shmem_falloc.start = unmap_start >> PAGE_SHIFT;
>  		shmem_falloc.next = (unmap_end + 1) >> PAGE_SHIFT;
>  		spin_lock(&inode->i_lock);
> @@ -1801,6 +1939,11 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
>  	if (error)
>  		goto out;
>  
> +	if ((info->seals & F_SEAL_GROW) && offset + len > inode->i_size) {
> +		error = -EPERM;
> +		goto out;
> +	}
> +
>  	start = offset >> PAGE_CACHE_SHIFT;
>  	end = (offset + len + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
>  	/* Try to avoid a swapstorm if len is impossible to satisfy */
> -- 
> 2.0.2
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux