On 2 Apr 2021, bfields@xxxxxxxxxxxx said: > Sorry, did you say whether nfsd threads or rpc.mountd are blocked? ... just about to switch into debugging this, but it does seem to me that if nfsd threads or (especially) mountd on the server side were blocked, I'd see misbehaviour with mounts from every client, not just a few of them. This doesn't happen. While this is going on, my firewall and other clients not engaging in the problematic Steam-related activity can talk NFSv4 to the server perfectly happily: indeed this is actually a problem when debugging because I have to quiesce the bloody things as much as I can to stop their RPC traffic flooding the log with irrelevant junk :) Recovery from this consists only of rebooting the stuck client: the server and all other clients don't need touching (indeed, I'm typing this in an emacs on that server, and since it was last rebooted it's been hit by a client experiencing this hang at least five times: the mailserver also keeps its mailspool on that server as well, and no problems there either). (The server also has fairly silly amounts of RAM compared to the load it's placed under. I'm not concerned about the possibility of rpc.mountd getting swapped out. It just doesn't happen. Even things like git gc of the entire Chromium git repo proceed without swapping.) btw, the filesystem layout on this machine is, in part: /dev/main/root xfs 4294993920 738953092 3556040828 18% / /dev/mapper/main-vms xfs 1073231868 406045460 667186408 38% /vm /dev/mapper/main-steam ext4 1055852896 85367140 916781564 9% /pkg/non-free/steam /dev/mapper/main-archive xfs 3219652608 2761922796 457729812 86% /usr/archive /dev/mapper/main-pete xfs 2468405656 2216785448 251620208 90% /usr/archive/music/Pete /dev/mapper/main-phones xfs 52411388 4354092 48057296 9% /.nfs/nix/share/phones /dev/mapper/main-unifi xfs 10491804 1130228 9361576 11% /var/lib/unifi /dev/mapper/oracrypt-plain 2147510784 144030636 2003480148 7% /home/oranix/oracle/private ... and you'll note that the exported fs I'm seeing hangs on is actually *not* the $HOME on the root fs: it's /pkg/non-free/steam, which is ext4 purely because so many games on x86 still fail horribly when 64-bit inodes are used, and ext4 can emit 32-bit inodes on biggish fses without horrible performance consequences, unlike xfs. The relevant import line: loom:/pkg/non-free/steam/.steam /home/nix/.steam nfs defaults,_netdev (so it is imported to *subdirectory* of a directory which is a mounted NFS export, and *that* one is exported from /). The hang also happens when using nfusr as the NFS client for the .steam import, so whatever it is isn't just down to the client... The primary reason I'm using one big fs for almost everything on this server build is, uh, NFSv4. My last machine had lots of little filesystems, and the result somehow confused the NFS pseudoroot construction process so badly that most of the things I tried to export never appeared on NFSv4, only on v3: only those exports which *were* on the root filesystem were ever available for NFSv4 mounting, so I was stuck with v3 on that machine. At (IIRC) Chuck Lever's suggestion (many years ago, so he probably won't remember) I varied things when I built a new server and was happy to find that with a less baroque setup and a bigger rootfs with more stuff on it, NFSv4 seemed perfectly happy and the pseudoroot was populated fine. OK let's collect some logs so we're not reasoning in the absence of data any more. Back soon! (I hope.)