On Tue, 10 Feb 2015 17:48:48 +0000 Nix <nix@xxxxxxxxxxxxx> wrote: > On 5 Feb 2015, NeilBrown spake thusly: > > > On Wed, 04 Feb 2015 23:28:17 +0000 Nix <nix@xxxxxxxxxxxxx> wrote: > >> It doesn't. It still recurs. > > > > Is /usr/archive still exported to mutilate with crossmnt? > > If it is, can you change to not do that (it is quite possible to have > > different export options for different clients). > > OK. Adjusted. > > > I think that if crossmnt is enabled on the server, then explicitly > > mounting /usr/archive/series will have the same net effect as not doing so > > (though I'm not 100% certain). > > > > Also, can you try changing > > /proc/sys/fs/nfs/nfs_mountpoint_timeout > > > > It defaults to 500 (seconds - time for light from Sun to reach Earth). > > If you make it smaller and the problem gets worse, or make it much bigger > > and the problem goes away, that would be interesting. > > If it makes no difference, that also would be interesting. > > Seems to make no difference, which is distinctly surprising. If > anything, it happens more often at the default value than at either the > high or low values. It's very erratic: it happened ten times in one day, > then three days passed and it didn't happen at all... system under > very similar load the whole time. > > >From other prompts, what I'm seeing now -- but wasn't then, before I > took the crossmnt out -- is an epidemic of spontaneous unmounting: i.e., > /usr/archive/series suddenly vanishes until remounted. > > I might just reboot all systems involved in this mess and hope it goes > away. I have no *clue* what's going on, I've never seen it before, maybe > it'll stop if I no longer believe in it. > This all sounds remarkably similar to a problem that a customer reported recently. In that case the server was a NetApp and v4 was in use and the server seemed to suggest that it was using volatile file handles. If a filehandle for a mounted-on directory changes, then (I think) a new inode will be allocated and the mountpoint will effectively disappear (though I think it should remain in /proc/mounts). However your have a Linux server and v3, so if it is the same problem, then I completely mis-diagnosed it. I wonder if something is going wrong in nfs_prime_dcache(). The code looks right, but it is a little complex... You could rule that out by disabling READDIRPLUS by using the nordirplus mount option. If that makes the proble go away, it would be very interesting... A more intrusive debugging approach would be to get d_drop() to scream if the dentry being dropped had DCACHE_MOUNTED set. Are you able to try either of those? NeilBrown
Attachment:
pgpNyp6imS5sN.pgp
Description: OpenPGP digital signature