On Jul 7, 2016, at 1:27 PM, Trond Myklebust wrote: > >> On Jul 7, 2016, at 13:07, Oleg Drokin <green@xxxxxxxxxxxxxx> wrote: >> >> >> On Jul 7, 2016, at 12:59 PM, Trond Myklebust wrote: >> >>> >>>> On Jul 7, 2016, at 12:52, Oleg Drokin <green@xxxxxxxxxxxxxx> wrote: >>>> >>>> >>>> On Jul 7, 2016, at 12:16 PM, Trond Myklebust wrote: >>>> >>>>> >>>>>> On Jul 7, 2016, at 01:53, Oleg Drokin <green@xxxxxxxxxxxxxx> wrote: >>>>>> >>>>>> It's great when we can shave an extra RPC, but not at the expense >>>>>> of correctness. >>>>>> We should not return EPERM (from vfs_create/mknod/mkdir) if the >>>>>> name already exists, even if we have no write access in parent. >>>>>> >>>>>> Since the check in nfs_permission is clearly not enough to stave >>>>>> off this, just throw in the extra READ access to actually >>>>>> go through. >>>>>> >>>>>> Signed-off-by: Oleg Drokin <green@xxxxxxxxxxxxxx> >>>>>> --- >>>>>> fs/nfs/dir.c | 4 +++- >>>>>> 1 file changed, 3 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c >>>>>> index d8015a0..8c7835b 100644 >>>>>> --- a/fs/nfs/dir.c >>>>>> +++ b/fs/nfs/dir.c >>>>>> @@ -1383,8 +1383,10 @@ struct dentry *nfs_lookup(struct inode *dir, struct dentry * dentry, unsigned in >>>>>> /* >>>>>> * If we're doing an exclusive create, optimize away the lookup >>>>>> * but don't hash the dentry. >>>>>> + * This optimization only works if we can write in the parent. >>>>>> */ >>>>>> - if (nfs_is_exclusive_create(dir, flags)) >>>>>> + if (nfs_is_exclusive_create(dir, flags) && >>>>>> + (inode_permission(dir, MAY_WRITE | MAY_READ | MAY_EXEC) == 0)) >>>>>> return NULL; >>>>>> >>>>> >>>>> NACK. The only write permission we should care about on the client side is whether or not the filesystem is mounted read-only. All other permissions are checked by the server. >>>> >>>> Right. This was mostly a discussion piece. >>>> The problem here is nfs_permission() returns 0 if you check for >>>> inode_permission(dir, MAY_WRITE | MAY_EXEC) (as in may_create), but then >>>> some other checks in the kernel still catch on to the fact that the directory >>>> is not writeable, so we have a premature failure with EPERM and server never sees >>>> this request which breaks things. >>> >>> Are these VFS level checks? Which ones? >> >> Yes, VFS level I believe. >> For Lustre it's may_create() from vfs_mkdir() that stops us short >> and the Lustre patch is effective. >> but for NFS this must be something else and I did not trace >> it completely. One of the security checks, I guess? >> if NFS patch is changed to check inode_permission(dir, MAY_OPEN | MAY_EXEC) >> as in Lustre, that returns 0 no matter what. >> >> This is trivial to reproduce too. > > So, should we be ignoring the MAY_EXEC flag when the VFS asks for inode_permission(dir, MAY_WRITE|MAY_EXEC)? I suspect that is the problem here. You know, I have been wrong here. VFS is not in the way. it's the NFS server that's returning us the error: [ 326.069066] NFS: mkdir(0:44/12), test [ 326.070510] nfsd_dispatch: vers 3 proc 9 [ 326.071612] nfsd: MKDIR(3) 16: 01010001 00000000 0000000c 4b3b7a92 00000000 00000000 test [ 326.071617] nfsd: fh_verify(16: 01010001 00000000 0000000c 4b3b7a92 00000000 00000000) [ 326.071684] fh_verify: /racer permission failure, acc=3, error=13 So I guess nfsd wants to be lazy too? What side do you think this is best fixed then? the client or the server?-- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html