> On Feb 24, 2022, at 4:25 PM, Olga Kornievskaia <olga.kornievskaia@xxxxxxxxx> wrote: > > On Thu, Feb 24, 2022 at 1:20 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: >> >> >>> On Feb 24, 2022, at 12:55 PM, Olga Kornievskaia <olga.kornievskaia@xxxxxxxxx> wrote: >>> >>> On Thu, Feb 24, 2022 at 10:30 AM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: >>>> >>>>> On Feb 23, 2022, at 12:40 PM, Olga Kornievskaia <olga.kornievskaia@xxxxxxxxx> wrote: >>>>> >>>>> From: Olga Kornievskaia <kolga@xxxxxxxxxx> >>>>> >>>>> Introduce a new mount option -- trunkdiscovery,notrunkdiscovery -- to >>>>> toggle whether or not the client will engage in actively discovery >>>>> of trunking locations. >>>> >>>> An alternative solution might be to change the client's >>>> probe to treat NFS4ERR_DELAY as "no trunking information >>>> available" and then allow operation to proceed on the >>>> known good transport. >>> >>> I'm not sure what you mean about "the known good transport". >> >> The transport on which the client sent the >> GETATTR(fs_locations). >> >> The NFS4ERR_DELAY response means the server has no other >> trunks available "at this time." > > But GETATTR(fs_locations) isn't only used for trunking query, it's > used for filesystem location (migration) as well. Are we redefining > what ERR_DELAY means in the context of trunking vs migration? I don't think I'm redefining what is described in RFC 8881 Section 15.1.1.3. The meaning of that status code is still the same; it's the client's recovery action that can be made to be different. During migration, NFS4ERR_DELAY holds off the client until open and lock state has been transitioned to the destination server. In that case DELAY has to serialize further operations from the client, and waiting and retrying is the correct response. I mean, the client won't know the hostname of the destination until the GETATTR(fs_locations) returns a successful result. For trunking discovery, DELAY still means roughly -EAGAIN. But it's up to the caller whether and when to try the operation again. I'm suggesting that in the context of trunking discovery, there's no need to halt progress until trunking discovery succeeds. The discovery probe can be dropped or retried in the background. >>> I do object to treating a single ERR_DELAY during discovery as a >>> permanent error as there are legitimate reasons to a delay in looking >>> up the information that can be resolved in time by the server. >>> However, I don't object to putting a time limit or number of tries on >>> ERR_DELAY as safety wheels. >> >> In the past, some have objected to /any/ delay added to >> the NFS mount process. > > I again would like to note that fs_locations is a file system > attribute thus I would argue has to be treated as other file system > attributes. True, fs_locations, as it was originally defined, is a per-filesystem attribute. But I don't see how that is relevant to this issue. The client doesn't have to wait for trunking information to start its operation using the main transport. >>> Lastly, I think perhaps we can do both have a mount option to toggle >>> discovery as well as safeguard the discovery from broken servers? >> >> I'd really rather not add a mount option for this >> purpose unless you know of another reason why trunking >> discovery needs to be disabled. > > I don't offhand. I thought it is the simplest and most appropriate > solution and perhaps inline with "migration/nomigration" option but I > must be mistaken there. The "migration" option was a last resort. There were really no other options to deal with servers that depend on non-uniform client IDs. There is an argument to be made that we shouldn't have added that mount option because it controls the behavior of all the mounts on that client. IMO you shouldn't use "migration" as any kind of precedent. -- Chuck Lever