On Wed, 2023-10-11 at 10:38 +1100, NeilBrown wrote: > On Thu, 28 Sep 2023, Trond Myklebust wrote: > > On Wed, 2023-09-27 at 12:10 +1000, NeilBrown wrote: > > > > > > hi Trond, > > > I'm trying to understand > > > > > > Commit 76c21e3f70a8 ("mountd: Check the stat() return values in > > > match_fsid()") > > > > > > in nfs-utils. > > > > > > The effect of this patch is that if a 'stat' of any path in > > > /etc/exports or any mountpoint below any path marked crossmnt > > > fails > > > with an error other than one of a small set, then the fsid to > > > path > > > lookup aborts without reporting anything to the kernel, so the > > > kernel > > > doesn't reply to the client and the mount attempt blocks > > > indefinitely. > > > > > > I have seen this happen when "/" is exported crossmnt, and when > > > a > > > stat > > > of /run/user/1000/doc returns EACCES. This is a "fuse" mount > > > for > > > user > > > 1000, and presumably it rejects any access from any other user. > > > > > > Could you please help me understand what this patch was trying > > > to > > > achieve? What sorts of errors were you expecting this to catch? > > > Would it make sense to silently ignore the stat failure for > > > paths > > > that > > > were found when scanning the mount table, and only take the more > > > drastic action for paths explicitly listed in /etc/exports ?? > > > > > > > > > > The point is that if the filesystem is re-exported NFS, and you > > mount > > as 'softerr' in order to avoid hanging your knfsd server on > > ephemeral > > errors, then you have to be more careful about which errors are > > fatal, > > and hence need to be propagated to the kernel export/path cache, > > and > > which ones are just temporary due to a network partition or server > > reboot. > > > > Hence the creation of path_lookup_error() which differentiates > > between > > the two cases. > > Thanks for the explanation - and sorry about the delay in following > up - > leave and stuff.... > > If I understand the 'softerr' option correctly, the particular error > code you need to catch in this case is ETIMEDOUT. That is the error > the > mountd would get when stating an NFS mount that wasn't responding. > So it would be safe for me to add EACCES to the set of errors that > path_lookup_error() checks - just as long as I don't add ETIMEDOUT. > Do you agree? > I think that should be safe, but it is worth testing. I don't remember exactly why I excluded EACCES from the list. > But I think there is another issue here. If the 'stat' of the > mountpoint returns ETIMEDOUT we want a transient failure so that the > kernel will ask again - presumably when the client asks again. > That doesn't happen with the current code. When the kernel queues a > request to mountd, it assumes that the request WILL be answered. It > never resends the same request. > > So mountd must reply with something. I think that returning a > negative > result with an expiry that has already passed might have the desired > result, but we would need to test that. > Ideally, we probably normally want to pass the filesystem error down into knfsd so that it can do the right thing. For ETIMEDOUT, that should result in an NFS4ERR_DELAY/NFS3ERR_JUKEBOX being sent to the client, which is what we want here. I'm not sure about caching these errors. It seems to me that while ETIMEDOUT might allow a time-bounded cache, the EACCES is a per-user error, so can't really be cached unless you are able to cache it as a per-user thing in knfsd. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx