Hi Olga, Chuck,
On 16/03/2022 17:09, Olga Kornievskaia wrote:
On Wed, Mar 16, 2022 at 11:14 AM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
On Mar 16, 2022, at 8:39 AM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
On 25 Feb 2022, at 17:48, Kurt Garloff wrote:
Hi,
Am 24. Februar 2022 14:42:41 MEZ schrieb Kurt Garloff <kurt@xxxxxxxxxx>:
Hi Olga,
[...]
I see a number of possibilities to resolve this:
(0) We pretend it's not a problem that's serious enough to take
action (and ensure that we do document this new option well).
(1) We revert the patch that does FS_LOCATIONS discovery.
Assuming that FS_LOCATIONS does provide a useful feature, this
would not be our preferred solution, I guess ...
(2) We prevent NFS v4.1 servers to use FS_LOCATIONS (like my patch
implements) and additionally allow for the opt-out with
notrunkdiscovery mount option. This fixes the known regression
with Qnap knfsd (without requiring user intervention) and still
allows for FS_LOCATIONS to be useful with NFSv4.2 servers that
support this. The disadvantage is that we won't use the feature
on NFSv4.1 servers which might support this feature perfectly
(and there's no opt-in possibility). And the risk is that there
might be NFSv4.2 servers out there that also misreport
FS_LOCATIONS support and still need manual intervention (which
at least is possible with notrunkdiscovery).
(3) We make this feature an opt-in thing and require users to
pass trunkdiscovery to take advantage of the feature.
This has zero risk of breakage (unless we screw up the patch),
but always requires user intervention to take advantage of
the FS_LOCATIONS feature.
(4) Identify a way to recover from the mount with FS_LOCATIONS
against the broken server, so instead of hanging we do just
turn this off if we find it not to work. Disadavantage is that
this adds complexity and might stall the mounting for a while
until the recovery worked. The complexity bears the risk for
us screwing up.
I personally find solutions 2 -- 4 acceptable.
If the experts see (4) as simple enough, it might be worth a try.
Otherwise (2) or (3) would seem the way to go from my perspective.
Any thought ls?
I think (3) is the best way, and perhaps using sysfs to toggle it would
be a solution to the problems presented by a mount option.
I'm worried that this issue is being ignored because that's usually what
happens when requests/patches are proposed that violate the policy of "we do
not fix the client for server bugs". In this case that policy conficts with
"no user visible regressions".
Can anyone declare which policy takes precedent?
"No regressions" probably takes precedent. IMO 1) should be done
immediately.
This is not a server bug, necessarily. The server is responding
within the realm of what is allowed by specification and I can
see cases where a server might have a good reason to DELAY an
fs_locations request for a while.
So I think once 1) has been done and backported to stable, the
client functionality should be restored via 4) to ensure that a
server can't delay a client mount indefinitely. (Although I think
I've stated that preference before).
Reverting the patch is equivalent to introducing a mount option (with
what I'm hearing a preference of notrunking being the default) but
possibly better. It solves the problems of the broken servers and it
allows servers that want this functionality to use it.
I agree with reverting and then introducing and opt-in setting instead.
I have not seen this submitted yet (though I have to admit that
I may have missed it).
So who is going to code this. If there is noone who has capacity
to do so, I can certainly jump in, though I am probably the least
skilled person in this conversation.
I don't see any need to involve a human in making the decision
to try fs_locations. The client should try fs_locations and if
it doesn't work, just move on as quickly as it can. As always,
I don't relish adding more administrative controls if we can
avoid it. Such controls add to our test matrix and our long
term technical debt. Easy to add, but very difficult to change
or remove once merged. I can't see an upside here.
That would be the better approach, but it also certainly is less
straight-forward to implement.
I'm happy to test, if we get some patch to look at short-term.
Otherwise, I'd suggest to go for reversion first and keep this
plan in the back of our minds for later execution.
Thanks.
--
Kurt Garloff <kurt@xxxxxxxxxx>
Cologne, Germany
--
Chuck Lever