Re: safe versions of NFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(resending this as it bounced off the list - I accidentally embedded HTML)

Yes, if you're pretty sure your hostnames are all different, the client_ids should be different. For v4.0 you can turn on debugging (rpcdebug -m nfs -s proc) and see the client_id in the kernel log in lines that look like: "NFS call setclientid auth=%s, '%s'\n", which will happen at mount time, but it doesn't look like we have any debugging for v4.1 and v4.2 for EXCHANGE_ID.

You can extract it via the crash utility, or via systemtap, or by doing a wire capture, but nothing that's easily translated to running across a large
number of machines.  There's probably other ways, perhaps we should tack
that string into the tracepoints for exchange_id and setclientid.

If you're interested in troubleshooting, wire capture's usually the most
informative.  If the lockup events all happen at the same time, there
might be some network event that is triggering the issue.

You should expect NFSv4.1 to be rock-solid.  Its rare we have reports
that it isn't, and I'd love to know why you're having these problems.

Ben

On 13 Apr 2021, at 11:38, hedrick@xxxxxxxxxxx wrote:

The server is ubuntu 20, with a ZFS file system.

I don’t set the unique ID. Documentation claims that it is set from the hostname. They will surely be unique, or the whole world would blow up. How can I check the actual unique ID being used? The kernel reports a blank one, but I think that just means to use the hostname. We could obviously set a unique one if that would be useful.

On Apr 13, 2021, at 11:35 AM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:

It would be interesting to know why your clients are failing to reclaim their locks. Something is misconfigured. What server are you using, and is there anything fancy on the server-side (like HA)? Is it possible that you have clients with the same nfs4_unique_id?

Ben

On 13 Apr 2021, at 11:17, hedrick@xxxxxxxxxxx wrote:

many, though not all, of the problems are “lock reclaim failed”.

On Apr 13, 2021, at 10:52 AM, Patrick Goetz <pgoetz@xxxxxxxxxxxxxxx> wrote:

I use NFS 4.2 with Ubuntu 18/20 workstations and Ubuntu 18/20 servers and haven't had any problems.

Check your configuration files; the last time I experienced something like this it's because I inadvertently used the same fsid on two different exports. Also recommend exporting top level directories only. Bind mount everything you want to export into /srv/nfs and only export those directories. According to Bruce F. this doesn't buy you any security (I still don't understand why), but it makes for a cleaner system configuration.

On 4/13/21 9:33 AM, hedrick@xxxxxxxxxxx wrote:
I am in charge of a large computer science dept computing infrastructure. We have a variety of student and develo9pment users. If there are problems we’ll see them.
We use an Ubuntu 20 server, with NVMe storage.
I’ve just had to move Centos 7 and Ubuntu 18 to use NFS 4.0. We had hangs with NFS 4.1 and 4.2. Files would appear to be locked, although eventually the lock would time out. It’s too soon to be sure that moving back to NFS 4.0 will fix it. Next is either NFS 3 or disabling delegations on the server. Are there known versions of NFS that are safe to use in production for various kernel versions? The one we’re most interested in is Ubuntu 20, which can be anything from 5.4 to 5.8.







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux