Re: [PATCH 1/2] nfs: Fix spurios EPERM when mkdir of existing dentry

Oleg Drokin <green@xxxxxxxxxxxxxx> · Thu, 7 Jul 2016 17:52:47 -0400

On Jul 7, 2016, at 1:27 PM, Trond Myklebust wrote:

> 
>> On Jul 7, 2016, at 13:07, Oleg Drokin <green@xxxxxxxxxxxxxx> wrote:
>> 
>> 
>> On Jul 7, 2016, at 12:59 PM, Trond Myklebust wrote:
>> 
>>> 
>>>> On Jul 7, 2016, at 12:52, Oleg Drokin <green@xxxxxxxxxxxxxx> wrote:
>>>> 
>>>> 
>>>> On Jul 7, 2016, at 12:16 PM, Trond Myklebust wrote:
>>>> 
>>>>> 
>>>>>> On Jul 7, 2016, at 01:53, Oleg Drokin <green@xxxxxxxxxxxxxx> wrote:
>>>>>> 
>>>>>> It's great when we can shave an extra RPC, but not at the expense
>>>>>> of correctness.
>>>>>> We should not return EPERM (from vfs_create/mknod/mkdir) if the
>>>>>> name already exists, even if we have no write access in parent.
>>>>>> 
>>>>>> Since the check in nfs_permission is clearly not enough to stave
>>>>>> off this, just throw in the extra READ access to actually
>>>>>> go through.
>>>>>> 
>>>>>> Signed-off-by: Oleg Drokin <green@xxxxxxxxxxxxxx>
>>>>>> ---
>>>>>> fs/nfs/dir.c | 4 +++-
>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>> 
>>>>>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>>>>>> index d8015a0..8c7835b 100644
>>>>>> --- a/fs/nfs/dir.c
>>>>>> +++ b/fs/nfs/dir.c
>>>>>> @@ -1383,8 +1383,10 @@ struct dentry *nfs_lookup(struct inode *dir, struct dentry * dentry, unsigned in
>>>>>> 	/*
>>>>>> 	 * If we're doing an exclusive create, optimize away the lookup
>>>>>> 	 * but don't hash the dentry.
>>>>>> +	 * This optimization only works if we can write in the parent.
>>>>>> 	 */
>>>>>> -	if (nfs_is_exclusive_create(dir, flags))
>>>>>> +	if (nfs_is_exclusive_create(dir, flags) &&
>>>>>> +	    (inode_permission(dir, MAY_WRITE | MAY_READ | MAY_EXEC) == 0))
>>>>>> 		return NULL;
>>>>>> 
>>>>> 
>>>>> NACK. The only write permission we should care about on the client side is whether or not the filesystem is mounted read-only. All other permissions are checked by the server.
>>>> 
>>>> Right. This was mostly a discussion piece.
>>>> The problem here is nfs_permission() returns 0 if you check for
>>>> inode_permission(dir, MAY_WRITE | MAY_EXEC) (as in may_create), but then
>>>> some other checks in the kernel still catch on to the fact that the directory
>>>> is not writeable, so we have a premature failure with EPERM and server never sees
>>>> this request which breaks things.
>>> 
>>> Are these VFS level checks? Which ones?
>> 
>> Yes, VFS level I believe.
>> For Lustre it's may_create() from vfs_mkdir() that stops us short
>> and the Lustre patch is effective.
>> but for NFS this must be something else and I did not trace
>> it completely. One of the security checks, I guess?
>> if NFS patch is changed to check inode_permission(dir, MAY_OPEN | MAY_EXEC)
>> as in Lustre, that returns 0 no matter what.
>> 
>> This is trivial to reproduce too.
> 
> So, should we be ignoring the MAY_EXEC flag when the VFS asks for inode_permission(dir, MAY_WRITE|MAY_EXEC)? I suspect that is the problem here.

You know, I have been wrong here.
VFS is not in the way.
it's the NFS server that's returning us the error:
[  326.069066] NFS: mkdir(0:44/12), test
[  326.070510] nfsd_dispatch: vers 3 proc 9
[  326.071612] nfsd: MKDIR(3)    16: 01010001 00000000 0000000c 4b3b7a92 00000000 00000000 test
[  326.071617] nfsd: fh_verify(16: 01010001 00000000 0000000c 4b3b7a92 00000000 00000000)
[  326.071684] fh_verify: /racer permission failure, acc=3, error=13

So I guess nfsd wants to be lazy too?
What side do you think this is best fixed then? the client or the server?--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html