Re: 3.7-rc1 NFSv3/sec=krb5 mkdir failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Sent from my iPad

On Nov 15, 2012, at 9:26 PM, "Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> wrote:

> On Thu, 2012-11-15 at 21:03 -0500, Chuck Lever wrote:
>> On Oct 28, 2012, at 12:15 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
>> 
>>> On Wed, Oct 24, 2012 at 04:40:59PM -0400, J. Bruce Fields wrote:
>>>> On Wed, Oct 24, 2012 at 08:34:37PM +0000, Myklebust, Trond wrote:
>>>>>> -----Original Message-----
>>>>>> From: Myklebust, Trond
>>>>>> Sent: Wednesday, October 24, 2012 4:31 PM
>>>>>> To: 'J. Bruce Fields'
>>>>>> Cc: linux-nfs@xxxxxxxxxxxxxxx; Schumaker, Bryan
>>>>>> Subject: RE: 3.7-rc1 NFSv3/sec=krb5 mkdir failure
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: J. Bruce Fields [mailto:bfields@xxxxxxxxxxxx]
>>>>>>> Sent: Wednesday, October 24, 2012 4:15 PM
>>>>>>> To: Myklebust, Trond
>>>>>>> Cc: linux-nfs@xxxxxxxxxxxxxxx; Schumaker, Bryan
>>>>>>> Subject: Re: 3.7-rc1 NFSv3/sec=krb5 mkdir failure
>>>>>>> 
>>>>>>> On Wed, Oct 24, 2012 at 08:07:55PM +0000, Myklebust, Trond wrote:
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: linux-nfs-owner@xxxxxxxxxxxxxxx [mailto:linux-nfs-
>>>>>>>>> owner@xxxxxxxxxxxxxxx] On Behalf Of J. Bruce Fields
>>>>>>>>> Sent: Wednesday, October 24, 2012 4:03 PM
>>>>>>>>> To: linux-nfs@xxxxxxxxxxxxxxx; Myklebust, Trond; Schumaker, Bryan
>>>>>>>>> Subject: Re: 3.7-rc1 NFSv3/sec=krb5 mkdir failure
>>>>>>>>> 
>>>>>>>>> Anyone get a chance to look at this?  It seems very reproduceable.
>>>>>>>>> 
>>>>>>>>> --b.
>>>>>>>>> 
>>>>>>>>> On Tue, Oct 16, 2012 at 08:58:32AM -0400, bfields wrote:
>>>>>>>>>> On 3.7-rc1:
>>>>>>>>>> 
>>>>>>>>>>    client# mount -tnfs -osec=krb5,vers=3 server:/exports/ext4
>>>>>>>>>> /mnt/
>>>>>>>>>> 
>>>>>>>>>>        server# ls -l /exports/ext4|grep TMP
>>>>>>>>>>        server#
>>>>>>>>>> 
>>>>>>>>>>    # mkdir /mnt/TMP
>>>>>>>>>>    mkdir: cannot create directory `/mnt/TMP': Permission denied
>>>>>>>>>> 
>>>>>>>>>>        server# ls -l /exports/ext4|grep TMP
>>>>>>>>>>        drwxr-xr-x  2 nfsnobody nfsnobody 4096 Oct 16 08:56 TMP
>>>>>>>>>>        server#
>>>>>>>>>> 
>>>>>>>>>> Wireshark also shows that the create succeeds.
>>>>>>>> 
>>>>>>>> Can you share the wireshark trace?
>>>>>>> 
>>>>>>> Sure.  This covers the mount and mkdir.  The mkdir call and reply are
>>>>>>> in frames 77 and 78.
>>>>>> 
>>>>>> Hmm.... Can you please check if the ACL is being set correctly on the server? I
>>>>>> suspect that might be the source of the error.
>>>>>> 
>>>>> 
>>>>> In fact, can you see if mounting with '-onoacl' causes the whole thing to succeed?
>>>> 
>>>> That's on the client mount command?  No difference.
>>> 
>>> By the way, I managed to do a little bisecting while working on
>>> something else today, and blame landed on Chuck's
>>> ba9b584c1dc37851d9c6ca6d0d2ccba55d9aad04 "SUNRPC: Introduce
>>> rpc_clone_client_set_auth()".  Which makes some sense if it's an ACL
>>> problem, and indeed testing on that commit finds success with -noacl,
>>> failure without.
>> 
>> After two weeks, Bruce and I were finally able to catch up in person.
>> 
>> I've reproduced this on 3.7-rc5 using cthon basic tests.  The first getacl operation fails because it's mistakenly attempting to set up a fresh GSS context on a transport where one already exists.  That's in line with the kind of change that's in commit ba9b584c1.
> 
> Why shouldn't we be able to cope with multiple GSS sessions on the same
> transport?

Perhaps we should be able to, in general.  But the successful case here does not attempt to create a new context, it simply uses one that is already associated with the transport.  That indicates that the kernel is making an incorrect upcall request perhaps because the new code is not cloning the RPC client correctly.

> 
>>> I'm not sure if that explains the failure I was seeing on 3.7-rc1, since
>>> there I didn't see any ACL traffic, and still got a failure.  (And
>>> -noacl didn't help.)
>> 
>> The failure occurs on the client just before the getacl request is issued, so you won't see any ACL-related network traffic in the failure case.  The failure prevents any ACL request from succeeding.
> 
> Is it gssd that is failing then?

The upcall fails, yes.  That is translated into an immediate failure of the getacl operation.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux