> On Feb 24, 2022, at 12:55 PM, Olga Kornievskaia <olga.kornievskaia@xxxxxxxxx> wrote: > > On Thu, Feb 24, 2022 at 10:30 AM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: >> >>> On Feb 23, 2022, at 12:40 PM, Olga Kornievskaia <olga.kornievskaia@xxxxxxxxx> wrote: >>> >>> From: Olga Kornievskaia <kolga@xxxxxxxxxx> >>> >>> Introduce a new mount option -- trunkdiscovery,notrunkdiscovery -- to >>> toggle whether or not the client will engage in actively discovery >>> of trunking locations. >> >> An alternative solution might be to change the client's >> probe to treat NFS4ERR_DELAY as "no trunking information >> available" and then allow operation to proceed on the >> known good transport. > > I'm not sure what you mean about "the known good transport". The transport on which the client sent the GETATTR(fs_locations). The NFS4ERR_DELAY response means the server has no other trunks available "at this time." > I don't > think the ERR_DELAY is associated with a transport. Btw, if you saw a > previous patch which restricts fs_location query to the main transport > makes your statement even more confusing as it would mean there is no > good transport. Or do you mean to say we should have trunking > discovery done asynchronous to mount by a separate kernel thread and > therefore not impact mount steps? Yes, something like that. Trunking discovery that is independent of the NFS mount process should be the goal. In fact, trunking discovery really ought to be done in user space. - There is now a user/kernel API for managing transports - The trunking configuration on the server might change during the lifetime of the mount, so periodic checking is needed - Adding an extra round trip, especially one that might be slowed by one or more NFS4ERR_DELAY replies, is going to be a problem during a mount storm - There might be local policies that affect which network paths to choose for trunking - The choice of transports might be made automatically by an orchestrator - Tying this setting to a mount option is not appropriate because the transports are shared amount multiple NFS mounts > I do object to treating a single ERR_DELAY during discovery as a > permanent error as there are legitimate reasons to a delay in looking > up the information that can be resolved in time by the server. > However, I don't object to putting a time limit or number of tries on > ERR_DELAY as safety wheels. In the past, some have objected to /any/ delay added to the NFS mount process. There's no reason to hold up the mount process -- the client can try the trunking discovery probe again in a few moments while the mount proceeds, can't it? If that means handing the probe to a work queue or leaving it to user space, that seems like a more flexible choice. > Lastly, I think perhaps we can do both have a mount option to toggle > discovery as well as safeguard the discovery from broken servers? I'd really rather not add a mount option for this purpose unless you know of another reason why trunking discovery needs to be disabled. The best solution is to fix the server implementations. If that's not possible then the second best is to have the client manage the situation without needing any human intervention. Adding an administrative tunable is, to my mind, an option of the very last resort. -- Chuck Lever