Re: safe versions of NFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Apr 13, 2021, at 12:23 PM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
> 
> (resending this as it bounced off the list - I accidentally embedded HTML)
> 
> Yes, if you're pretty sure your hostnames are all different, the client_ids
> should be different.  For v4.0 you can turn on debugging (rpcdebug -m nfs -s
> proc) and see the client_id in the kernel log in lines that look like: "NFS
> call setclientid auth=%s, '%s'\n", which will happen at mount time, but it
> doesn't look like we have any debugging for v4.1 and v4.2 for EXCHANGE_ID.
> 
> You can extract it via the crash utility, or via systemtap, or by doing a
> wire capture, but nothing that's easily translated to running across a large
> number of machines.  There's probably other ways, perhaps we should tack
> that string into the tracepoints for exchange_id and setclientid.
> 
> If you're interested in troubleshooting, wire capture's usually the most
> informative.  If the lockup events all happen at the same time, there
> might be some network event that is triggering the issue.
> 
> You should expect NFSv4.1 to be rock-solid.  Its rare we have reports
> that it isn't, and I'd love to know why you're having these problems.

I echo that: NFSv4.1 protocol and implementation are mature, so if
there are operational problems, it should be root-caused.

NFSv4.1 uses a uniform client ID. That should be the "good" one,
not the NFSv4.0 one that has a non-zero probability of collision.

Charles, please let us know if there are particular workloads that
trigger the lock reclaim failure. A narrow reproducer would help
get to the root issue quickly.


> Ben
> 
> On 13 Apr 2021, at 11:38, hedrick@xxxxxxxxxxx wrote:
> 
>> The server is ubuntu 20, with a ZFS file system.
>> 
>> I don’t set the unique ID. Documentation claims that it is set from the hostname. They will surely be unique, or the whole world would blow up. How can I check the actual unique ID being used? The kernel reports a blank one, but I think that just means to use the hostname. We could obviously set a unique one if that would be useful.
>> 
>>> On Apr 13, 2021, at 11:35 AM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>>> 
>>> It would be interesting to know why your clients are failing to reclaim their locks.  Something is misconfigured.  What server are you using, and is there anything fancy on the server-side (like HA)?  Is it possible that you have clients with the same nfs4_unique_id?
>>> 
>>> Ben
>>> 
>>> On 13 Apr 2021, at 11:17, hedrick@xxxxxxxxxxx wrote:
>>> 
>>>> many, though not all, of the problems are “lock reclaim failed”.
>>>> 
>>>>> On Apr 13, 2021, at 10:52 AM, Patrick Goetz <pgoetz@xxxxxxxxxxxxxxx> wrote:
>>>>> 
>>>>> I use NFS 4.2 with Ubuntu 18/20 workstations and Ubuntu 18/20 servers and haven't had any problems.
>>>>> 
>>>>> Check your configuration files; the last time I experienced something like this it's because I inadvertently used the same fsid on two different exports. Also recommend exporting top level directories only.  Bind mount everything you want to export into /srv/nfs and only export those directories. According to Bruce F. this doesn't buy you any security (I still don't understand why), but it makes for a cleaner system configuration.
>>>>> 
>>>>> On 4/13/21 9:33 AM, hedrick@xxxxxxxxxxx wrote:
>>>>>> I am in charge of a large computer science dept computing infrastructure. We have a variety of student and develo9pment users. If there are problems we’ll see them.
>>>>>> We use an Ubuntu 20 server, with NVMe storage.
>>>>>> I’ve just had to move Centos 7 and Ubuntu 18 to use NFS 4.0. We had hangs with NFS 4.1 and 4.2. Files would appear to be locked, although eventually the lock would time out. It’s too soon to be sure that moving back to NFS 4.0 will fix it. Next is either NFS 3 or disabling delegations on the server.
>>>>>> Are there known versions of NFS that are safe to use in production for various kernel versions? The one we’re most interested in is Ubuntu 20, which can be anything from 5.4 to 5.8.
>>> 
> 
> 
> 

--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux