Re: Problem with providing implementation id in NFSv4.1

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Thu, 7 Jul 2022 04:17:18 +0000

On Thu, 2022-07-07 at 11:13 +1000, NeilBrown wrote:
> 
> In NFSv4.1 when we EXCHANGE_ID to talk to a new server - possibly a
> PNFS
> Data Server that we haven't talked to before - we by default send an
> implementation id.  This is created from several fields obtained from
> utsname().
> utsname() depends on current->nsproxy, and will crash if that is
> NULL.
> 
> When a process exits it calls, among other things,
> 
>         exit_task_namespaces(tsk);
>         exit_task_work(tsk);
> 
> exit_task_namespaces() will set ->nsproxy to NULL
> exit_task_work() will run delayed work items, including fput() on all
> files that were still open when the process exited.  This will cause
> any
> pending writes to be flushed for NFS.
> 
> So if a process writes to a file on a PNFS server, exits, and the MDS
> tells the client to send the data to a DS which it hasn't established
> a
> connection with before, then it will crash in encode_exchange_id().
> 
> That order of calls in do_exit() is deliberate so we cannot swap them
> - see
> Commit: 8aac62706ada ("move exit_task_namespaces() outside of
> exit_notify()")
> 
> The options that I can see are:
> 1/ generate the implementation-id string at mount time and keep it
>    around much like we do for cl_owner_id
> 2/ Check current->nsproxy in encode_exchange_id() and skip the
>    implementation id if ->nsproxy is not available.
>    Note that there is no risk for a race with testing ->nxproxy.
> 
> Doesn't anyone have a strong opinion of which is best.  I'm inclined
> to
> go with '2', but mostly because it is less coding.
> 

I'd be fine with ignoring the implementation id in that case. The
protocol doesn't require it, so servers are expecte to be able to deal
with that case.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx