On Jul 7, 2016, at 12:59 PM, Trond Myklebust wrote: > >> On Jul 7, 2016, at 12:52, Oleg Drokin <green@xxxxxxxxxxxxxx> wrote: >> >> >> On Jul 7, 2016, at 12:16 PM, Trond Myklebust wrote: >> >>> >>>> On Jul 7, 2016, at 01:53, Oleg Drokin <green@xxxxxxxxxxxxxx> wrote: >>>> >>>> It's great when we can shave an extra RPC, but not at the expense >>>> of correctness. >>>> We should not return EPERM (from vfs_create/mknod/mkdir) if the >>>> name already exists, even if we have no write access in parent. >>>> >>>> Since the check in nfs_permission is clearly not enough to stave >>>> off this, just throw in the extra READ access to actually >>>> go through. >>>> >>>> Signed-off-by: Oleg Drokin <green@xxxxxxxxxxxxxx> >>>> --- >>>> fs/nfs/dir.c | 4 +++- >>>> 1 file changed, 3 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c >>>> index d8015a0..8c7835b 100644 >>>> --- a/fs/nfs/dir.c >>>> +++ b/fs/nfs/dir.c >>>> @@ -1383,8 +1383,10 @@ struct dentry *nfs_lookup(struct inode *dir, struct dentry * dentry, unsigned in >>>> /* >>>> * If we're doing an exclusive create, optimize away the lookup >>>> * but don't hash the dentry. >>>> + * This optimization only works if we can write in the parent. >>>> */ >>>> - if (nfs_is_exclusive_create(dir, flags)) >>>> + if (nfs_is_exclusive_create(dir, flags) && >>>> + (inode_permission(dir, MAY_WRITE | MAY_READ | MAY_EXEC) == 0)) >>>> return NULL; >>>> >>> >>> NACK. The only write permission we should care about on the client side is whether or not the filesystem is mounted read-only. All other permissions are checked by the server. >> >> Right. This was mostly a discussion piece. >> The problem here is nfs_permission() returns 0 if you check for >> inode_permission(dir, MAY_WRITE | MAY_EXEC) (as in may_create), but then >> some other checks in the kernel still catch on to the fact that the directory >> is not writeable, so we have a premature failure with EPERM and server never sees >> this request which breaks things. > > Are these VFS level checks? Which ones? Yes, VFS level I believe. For Lustre it's may_create() from vfs_mkdir() that stops us short and the Lustre patch is effective. but for NFS this must be something else and I did not trace it completely. One of the security checks, I guess? if NFS patch is changed to check inode_permission(dir, MAY_OPEN | MAY_EXEC) as in Lustre, that returns 0 no matter what. This is trivial to reproduce too. >> (the read-only mount is not handled as well at the moment of course and my patch >> does not address this issue either, but it's easier to address in the VFS, like >> in filename_create() or something). >> >> I see that two major consumers of this nfs_permission MAY_WRITE|!MAY_READ check >> are creates and deletes and with deletes we had a lookup already, so it already >> looked up the child and revalidated the parent. >> For creates, a revalidation still might be needed, I guess and that was the main driver >> behind this check? And that only when you do current dir creates, because otherwise >> the parent would have been revalidated in lookup? >> Is this the major case why that check is actually there? >> >> Just trying to see how to approach this better without breaking the applications. > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html