On Tue, 4 Feb 2014 16:03:14 +0400 Pavel Shilovsky <piastry@xxxxxxxxxxx> wrote: > 2014-02-01 Jeff Layton <jlayton@xxxxxxxxxx>: > > On Fri, 17 Jan 2014 14:07:06 +0400 > > Pavel Shilovsky <piastry@xxxxxxxxxxx> wrote: > > > >> This patch adds 3 flags: > >> 1) O_DENYREAD that doesn't permit read access, > >> 2) O_DENYWRITE that doesn't permit write access, > >> 3) O_DENYDELETE that doesn't permit delete or rename. > >> > >> Network filesystems CIFS, SMB2.0, SMB3.0 and NFSv4 have such flags - > >> this change can benefit cifs and nfs modules as well as Samba and > >> NFS file servers that export the same directory for Windows clients, > >> or Wine applications that access the same files simultaneously. > >> > >> These flags are only take affect for opens on mounts with new sharelock > >> option. They are translated to flock's flags: > >> > >> !O_DENYREAD -> LOCK_READ | LOCK_MAND > >> !O_DENYWRITE -> LOCK_WRITE | LOCK_MAND > >> > >> and set through flock_lock_file on a file. If the file can't be locked > >> due conflicts with another open with O_DENY* flags, a new -ESHAREDENIED > >> error code is returned. > >> > >> Create codepath is slightly changed to prevent data races on newly > >> created files: when open with O_CREAT can return -ESHAREDENIED error > >> for successfully created files due to a sharelock set by another task. > >> > >> Temporary disable O_DENYDELETE support - will enable it in further > >> patches. > >> > >> Signed-off-by: Pavel Shilovsky <piastry@xxxxxxxxxxx> > >> --- > >> arch/alpha/include/uapi/asm/errno.h | 2 + > >> arch/alpha/include/uapi/asm/fcntl.h | 3 ++ > >> arch/mips/include/uapi/asm/errno.h | 2 + > >> arch/parisc/include/uapi/asm/errno.h | 2 + > >> arch/parisc/include/uapi/asm/fcntl.h | 3 ++ > >> arch/sparc/include/uapi/asm/errno.h | 2 + > >> arch/sparc/include/uapi/asm/fcntl.h | 3 ++ > >> fs/fcntl.c | 5 +- > >> fs/locks.c | 97 +++++++++++++++++++++++++++++++--- > >> fs/namei.c | 53 ++++++++++++++++++- > >> fs/proc_namespace.c | 1 + > >> include/linux/fs.h | 8 +++ > >> include/uapi/asm-generic/errno.h | 2 + > >> include/uapi/asm-generic/fcntl.h | 11 ++++ > >> include/uapi/linux/fs.h | 1 + > >> 15 files changed, 185 insertions(+), 10 deletions(-) > >> > > > > You might consider breaking this patch into two. One patch that makes > > LOCK_MAND locks actually work and that adds MS_SHARELOCK, and one patch > > that hooks that up to open(). Given the locking involved with the > > i_mutex it would be best to present this as a series of small, > > incremental changes. > > Good point. So, we can break it into 2: > 1) make flock actually work with LOCK_MAND on MS_SHARELOCK mounts, > 2) replace flock+LOCK_MAND with open+O_DENY* flags. > > > >> diff --git a/arch/alpha/include/uapi/asm/errno.h b/arch/alpha/include/uapi/asm/errno.h > >> index 17f92aa..953a6d6 100644 > >> --- a/arch/alpha/include/uapi/asm/errno.h > >> +++ b/arch/alpha/include/uapi/asm/errno.h > >> @@ -124,4 +124,6 @@ > >> > >> #define EHWPOISON 139 /* Memory page has hardware error */ > >> > >> +#define ESHAREDENIED 140 /* File is locked with a sharelock */ > >> + > >> #endif > >> diff --git a/arch/alpha/include/uapi/asm/fcntl.h b/arch/alpha/include/uapi/asm/fcntl.h > >> index 09f49a6..265344b 100644 > >> --- a/arch/alpha/include/uapi/asm/fcntl.h > >> +++ b/arch/alpha/include/uapi/asm/fcntl.h > >> @@ -33,6 +33,9 @@ > >> > >> #define O_PATH 040000000 > >> #define __O_TMPFILE 0100000000 > >> +#define O_DENYREAD 0200000000 /* Do not permit read access */ > >> +#define O_DENYWRITE 0400000000 /* Do not permit write access */ > >> +#define O_DENYDELETE 01000000000 /* Do not permit delete or rename */ > >> > >> #define F_GETLK 7 > >> #define F_SETLK 8 > >> diff --git a/arch/mips/include/uapi/asm/errno.h b/arch/mips/include/uapi/asm/errno.h > >> index 02d645d..f1a4068 100644 > >> --- a/arch/mips/include/uapi/asm/errno.h > >> +++ b/arch/mips/include/uapi/asm/errno.h > >> @@ -123,6 +123,8 @@ > >> > >> #define EHWPOISON 168 /* Memory page has hardware error */ > >> > >> +#define ESHAREDENIED 169 /* File is locked with a sharelock */ > >> + > >> #define EDQUOT 1133 /* Quota exceeded */ > >> > >> > >> diff --git a/arch/parisc/include/uapi/asm/errno.h b/arch/parisc/include/uapi/asm/errno.h > >> index f3a8aa5..654c232 100644 > >> --- a/arch/parisc/include/uapi/asm/errno.h > >> +++ b/arch/parisc/include/uapi/asm/errno.h > >> @@ -124,4 +124,6 @@ > >> > >> #define EHWPOISON 257 /* Memory page has hardware error */ > >> > >> +#define ESHAREDENIED 258 /* File is locked with a sharelock */ > >> + > >> #endif > >> diff --git a/arch/parisc/include/uapi/asm/fcntl.h b/arch/parisc/include/uapi/asm/fcntl.h > >> index 34a46cb..5865964 100644 > >> --- a/arch/parisc/include/uapi/asm/fcntl.h > >> +++ b/arch/parisc/include/uapi/asm/fcntl.h > >> @@ -21,6 +21,9 @@ > >> > >> #define O_PATH 020000000 > >> #define __O_TMPFILE 040000000 > >> +#define O_DENYREAD 0200000000 /* Do not permit read access */ > >> +#define O_DENYWRITE 0400000000 /* Do not permit write access */ > >> +#define O_DENYDELETE 01000000000 /* Do not permit delete or rename */ > >> > >> #define F_GETLK64 8 > >> #define F_SETLK64 9 > >> diff --git a/arch/sparc/include/uapi/asm/errno.h b/arch/sparc/include/uapi/asm/errno.h > >> index 20423e17..fe339b5 100644 > >> --- a/arch/sparc/include/uapi/asm/errno.h > >> +++ b/arch/sparc/include/uapi/asm/errno.h > >> @@ -114,4 +114,6 @@ > >> > >> #define EHWPOISON 135 /* Memory page has hardware error */ > >> > >> +#define ESHAREDENIED 136 /* File is locked with a sharelock */ > >> + > >> #endif > >> diff --git a/arch/sparc/include/uapi/asm/fcntl.h b/arch/sparc/include/uapi/asm/fcntl.h > >> index 7e8ace5..ab68170 100644 > >> --- a/arch/sparc/include/uapi/asm/fcntl.h > >> +++ b/arch/sparc/include/uapi/asm/fcntl.h > >> @@ -36,6 +36,9 @@ > >> > >> #define O_PATH 0x1000000 > >> #define __O_TMPFILE 0x2000000 > >> +#define O_DENYREAD 0x4000000 /* Do not permit read access */ > >> +#define O_DENYWRITE 0x8000000 /* Do not permit write access */ > >> +#define O_DENYDELETE 0x10000000 /* Do not permit delete or rename */ > >> > > > > It'd probably be best to add O_DENYDELETE in a separate patch, rather > > than disabling it temporarily. > > Agree. > > > > >> #define F_GETOWN 5 /* for sockets. */ > >> #define F_SETOWN 6 /* for sockets. */ > >> diff --git a/fs/fcntl.c b/fs/fcntl.c > >> index ef68665..3f85887 100644 > >> --- a/fs/fcntl.c > >> +++ b/fs/fcntl.c > >> @@ -729,14 +729,15 @@ static int __init fcntl_init(void) > >> * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY > >> * is defined as O_NONBLOCK on some platforms and not on others. > >> */ > >> - BUILD_BUG_ON(20 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32( > >> + BUILD_BUG_ON(23 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32( > >> O_RDONLY | O_WRONLY | O_RDWR | > >> O_CREAT | O_EXCL | O_NOCTTY | > >> O_TRUNC | O_APPEND | /* O_NONBLOCK | */ > >> __O_SYNC | O_DSYNC | FASYNC | > >> O_DIRECT | O_LARGEFILE | O_DIRECTORY | > >> O_NOFOLLOW | O_NOATIME | O_CLOEXEC | > >> - __FMODE_EXEC | O_PATH | __O_TMPFILE > >> + __FMODE_EXEC | O_PATH | __O_TMPFILE | > >> + O_DENYREAD | O_DENYWRITE | O_DENYDELETE > >> )); > >> > >> fasync_cache = kmem_cache_create("fasync_cache", > >> diff --git a/fs/locks.c b/fs/locks.c > >> index 92a0f0a..ffde4d4 100644 > >> --- a/fs/locks.c > >> +++ b/fs/locks.c > >> @@ -708,20 +708,73 @@ static int posix_locks_conflict(struct file_lock *caller_fl, struct file_lock *s > >> return (locks_conflict(caller_fl, sys_fl)); > >> } > >> > >> -/* Determine if lock sys_fl blocks lock caller_fl. FLOCK specific > >> - * checking before calling the locks_conflict(). > >> +static unsigned int > >> +deny_flags_to_cmd(unsigned int flags) > >> +{ > >> + unsigned int cmd = LOCK_MAND; > >> + > >> + if (!(flags & O_DENYREAD)) > >> + cmd |= LOCK_READ; > >> + if (!(flags & O_DENYWRITE)) > >> + cmd |= LOCK_WRITE; > >> + > >> + return cmd; > >> +} > >> + > >> +/* > >> + * locks_mand_conflict - Determine if there's a share reservation conflict > >> + * @caller_fl: lock we're attempting to acquire > >> + * @sys_fl: lock already present on system that we're checking against > >> + * > >> + * Check to see if there's a share_reservation conflict. LOCK_READ/LOCK_WRITE > >> + * tell us whether the reservation allows other readers and writers. > >> + */ > >> +static int > >> +locks_mand_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl) > >> +{ > >> + unsigned char caller_type = caller_fl->fl_type; > >> + unsigned char sys_type = sys_fl->fl_type; > >> + fmode_t caller_fmode = caller_fl->fl_file->f_mode; > >> + fmode_t sys_fmode = sys_fl->fl_file->f_mode; > >> + > >> + /* they can only conflict if FS is mounted with MS_SHARELOCK */ > >> + if (!IS_SHARELOCK(caller_fl->fl_file->f_path.dentry->d_inode)) > >> + return 0; > >> + > >> + /* they can only conflict if they're both LOCK_MAND */ > >> + if (!(caller_type & LOCK_MAND) || !(sys_type & LOCK_MAND)) > >> + return 0; > >> + > >> + if (!(caller_type & LOCK_READ) && (sys_fmode & FMODE_READ)) > >> + return 1; > >> + if (!(caller_type & LOCK_WRITE) && (sys_fmode & FMODE_WRITE)) > >> + return 1; > >> + if (!(sys_type & LOCK_READ) && (caller_fmode & FMODE_READ)) > >> + return 1; > >> + if (!(sys_type & LOCK_WRITE) && (caller_fmode & FMODE_WRITE)) > >> + return 1; > >> + > >> + return 0; > >> +} > >> + > >> +/* > >> + * Determine if lock sys_fl blocks lock caller_fl. FLOCK specific checking > >> + * before calling the locks_conflict(). > >> */ > >> static int flock_locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl) > >> { > >> - /* FLOCK locks referring to the same filp do not conflict with > >> + if (!IS_FLOCK(sys_fl)) > >> + return 0; > >> + if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND)) > >> + return locks_mand_conflict(caller_fl, sys_fl); > > > > nit: Seems like the above could be optimized a little. You know that > > locks_mand_conflict is only relevant if both are LOCK_MAND, and one of > > the first things that locks_mand_conflict does is to check that both > > have that set. > > ok. > > > > >> + /* > >> + * FLOCK locks referring to the same filp do not conflict with > >> * each other. > >> */ > >> - if (!IS_FLOCK(sys_fl) || (caller_fl->fl_file == sys_fl->fl_file)) > >> - return (0); > >> - if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND)) > >> + if (caller_fl->fl_file == sys_fl->fl_file) > >> return 0; > >> > >> - return (locks_conflict(caller_fl, sys_fl)); > >> + return locks_conflict(caller_fl, sys_fl); > >> } > >> > >> void > >> @@ -888,6 +941,36 @@ out: > >> return error; > >> } > >> > >> +/* > >> + * Determine if a file is allowed to be opened with specified access and share > >> + * modes. Lock the file and return 0 if checks passed, otherwise return > >> + * -ESHAREDENIED. > >> + */ > >> +int > >> +sharelock_lock_file(struct file *filp) > >> +{ > >> + struct file_lock *lock; > >> + int error = 0; > >> + > >> + if (!IS_SHARELOCK(filp->f_path.dentry->d_inode)) > >> + return error; > >> + > >> + /* Disable O_DENYDELETE support for now */ > >> + if (filp->f_flags & O_DENYDELETE) > >> + return -EINVAL; > >> + > >> + error = flock_make_lock(filp, &lock, deny_flags_to_cmd(filp->f_flags)); > >> + if (error) > >> + return error; > >> + > >> + error = flock_lock_file(filp, lock); > >> + if (error == -EAGAIN) > >> + error = -ESHAREDENIED; > >> + > >> + locks_free_lock(lock); > >> + return error; > >> +} > >> + > >> static int __posix_lock_file(struct inode *inode, struct file_lock *request, struct file_lock *conflock) > >> { > >> struct file_lock *fl; > >> diff --git a/fs/namei.c b/fs/namei.c > >> index 3531dee..2b741a1 100644 > >> --- a/fs/namei.c > >> +++ b/fs/namei.c > >> @@ -2725,9 +2725,14 @@ static int atomic_open(struct nameidata *nd, struct dentry *dentry, > >> acc_mode = MAY_OPEN; > >> } > >> error = may_open(&file->f_path, acc_mode, open_flag); > >> - if (error) > >> + if (error) { > >> fput(file); > >> + goto out; > >> + } > >> > >> + error = sharelock_lock_file(file); > >> + if (error) > >> + fput(file); > >> out: > >> dput(dentry); > >> return error; > >> @@ -2919,6 +2924,40 @@ retry_lookup: > >> } > >> mutex_lock(&dir->d_inode->i_mutex); > >> error = lookup_open(nd, path, file, op, got_write, opened); > >> + > >> + /* > >> + * For sharelock mounts if a file was created but not opened, we need > >> + * to keep parent i_mutex until we finish the open to prevent races when > >> + * somebody opens newly created by us file and locks it with a sharelock > >> + * before we open it. > >> + */ > >> + if (IS_SHARELOCK(dir->d_inode) && error > 0 && *opened & FILE_CREATED) { > >> + /* Don't check for write permission, don't truncate */ > >> + open_flag &= ~O_TRUNC; > >> + will_truncate = false; > >> + acc_mode = MAY_OPEN; > >> + path_to_nameidata(path, nd); > >> + > >> + error = may_open(&nd->path, acc_mode, open_flag); > >> + if (error) { > >> + mutex_unlock(&dir->d_inode->i_mutex); > >> + goto out; > >> + } > >> + file->f_path.mnt = nd->path.mnt; > >> + error = finish_open(file, nd->path.dentry, NULL, opened); > >> + if (error) { > >> + mutex_unlock(&dir->d_inode->i_mutex); > >> + if (error == -EOPENSTALE) > >> + goto stale_open; > >> + goto out; > >> + } > >> + error = sharelock_lock_file(file); > >> + mutex_unlock(&dir->d_inode->i_mutex); > >> + if (error) > >> + goto exit_fput; > >> + goto opened; > >> + } > >> + > >> mutex_unlock(&dir->d_inode->i_mutex); > >> > >> if (error <= 0) { > >> @@ -3034,6 +3073,18 @@ finish_open_created: > >> goto stale_open; > >> goto out; > >> } > >> + > >> + if (IS_SHARELOCK(dir->d_inode)) { > >> + /* > >> + * Lock parent i_mutex to prevent races with sharelocks on > >> + * newly created files. > >> + */ > >> + mutex_lock(&dir->d_inode->i_mutex); > >> + error = sharelock_lock_file(file); > >> + mutex_unlock(&dir->d_inode->i_mutex); > >> + if (error) > >> + goto exit_fput; > >> + } > >> opened: > >> error = open_check_o_direct(file); > >> if (error) > >> diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c > >> index 439406e..dd374d4 100644 > >> --- a/fs/proc_namespace.c > >> +++ b/fs/proc_namespace.c > >> @@ -44,6 +44,7 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb) > >> { MS_SYNCHRONOUS, ",sync" }, > >> { MS_DIRSYNC, ",dirsync" }, > >> { MS_MANDLOCK, ",mand" }, > >> + { MS_SHARELOCK, ",sharelock" }, > >> { 0, NULL } > >> }; > >> const struct proc_fs_info *fs_infop; > >> diff --git a/include/linux/fs.h b/include/linux/fs.h > >> index 121f11f..aa061ca 100644 > >> --- a/include/linux/fs.h > >> +++ b/include/linux/fs.h > >> @@ -1029,6 +1029,7 @@ extern int vfs_setlease(struct file *, long, struct file_lock **); > >> extern int lease_modify(struct file_lock **, int); > >> extern int lock_may_read(struct inode *, loff_t start, unsigned long count); > >> extern int lock_may_write(struct inode *, loff_t start, unsigned long count); > >> +extern int sharelock_lock_file(struct file *); > >> #else /* !CONFIG_FILE_LOCKING */ > >> static inline int fcntl_getlk(struct file *file, struct flock __user *user) > >> { > >> @@ -1169,6 +1170,12 @@ static inline int lock_may_write(struct inode *inode, loff_t start, > >> { > >> return 1; > >> } > >> + > >> +static inline int sharelock_lock_file(struct file *filp) > >> +{ > >> + return 0; > >> +} > >> + > >> #endif /* !CONFIG_FILE_LOCKING */ > >> > >> > >> @@ -1675,6 +1682,7 @@ struct super_operations { > >> #define IS_PRIVATE(inode) ((inode)->i_flags & S_PRIVATE) > >> #define IS_IMA(inode) ((inode)->i_flags & S_IMA) > >> #define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT) > >> +#define IS_SHARELOCK(inode) __IS_FLG(inode, MS_SHARELOCK) > >> #define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC) > >> > >> /* > >> diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h > >> index 1e1ea6e..aff869c 100644 > >> --- a/include/uapi/asm-generic/errno.h > >> +++ b/include/uapi/asm-generic/errno.h > >> @@ -110,4 +110,6 @@ > >> > >> #define EHWPOISON 133 /* Memory page has hardware error */ > >> > >> +#define ESHAREDENIED 134 /* File is locked with a sharelock */ > >> + > >> #endif > >> diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h > >> index 95e46c8..9881cfe 100644 > >> --- a/include/uapi/asm-generic/fcntl.h > >> +++ b/include/uapi/asm-generic/fcntl.h > >> @@ -92,6 +92,17 @@ > >> #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY) > >> #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT) > >> > >> +#ifndef O_DENYREAD > >> +#define O_DENYREAD 040000000 /* Do not permit read access */ > >> +#endif > >> +/* FMODE_NONOTIFY 0100000000 */ > >> +#ifndef O_DENYWRITE > >> +#define O_DENYWRITE 0200000000 /* Do not permit write access */ > >> +#endif > >> +#ifndef O_DENYDELETE > >> +#define O_DENYDELETE 0400000000 /* Do not permit delete or rename */ > >> +#endif > >> + > > > > One thing to consider: We found with the addition of O_TMPFILE that the > > open() api is not particularly helpful when it comes to informing > > appications when a flag isn't supported: > > > > http://lwn.net/Articles/562294/ > > > > ...having a plan to cope with that here would be best. How can an > > application determine at runtime that O_DENY* actually *work*? It may > > be best to step back and consider a new syscall for this (open2() ?). > > > > So, consider we added new syscall: > > opendm(filename, flags, mode, deny_mode) > { > return open(filename, flags | denymode2openflags(deny_mode), mode) > } > > where deny_mode can be DMODE_NONE (0), DMODE_READ (1), DMODE_WRITE(2) > and DMODE_RDWR(3) (similar to FMODE_* values). > > We have open and opendm that act actually in the same manner for > mounts without MS_SHARELOCK. For mounts with MS_SHARELOCK open is like > opendm with DMODE_NONE. Open flags O_DENY* are for internal use only. > > Is it what you suggest? > Right, something that like that maybe... ...or possibly consider not making this specific to deny modes or anything, and just consider adding a new generic "openat2()" syscall. This one could have a larger (or maybe extensible) field for flags, and well-defined behavior when presented with a flag that it doesn't understand. I realize that that's a large increase in scope, but it can often be easier to get new features merged if you are simultaneously addressing other problems that exist. ;) -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html