Re: Problem with providing implementation id in NFSv4.1

"NeilBrown" <neilb@xxxxxxx> · Fri, 08 Jul 2022 17:01:48 +1000

On Thu, 07 Jul 2022, Trond Myklebust wrote:
> On Thu, 2022-07-07 at 11:13 +1000, NeilBrown wrote:
> > 
> > In NFSv4.1 when we EXCHANGE_ID to talk to a new server - possibly a
> > PNFS
> > Data Server that we haven't talked to before - we by default send an
> > implementation id.  This is created from several fields obtained from
> > utsname().
> > utsname() depends on current->nsproxy, and will crash if that is
> > NULL.
> > 
> > When a process exits it calls, among other things,
> > 
> >         exit_task_namespaces(tsk);
> >         exit_task_work(tsk);
> > 
> > exit_task_namespaces() will set ->nsproxy to NULL
> > exit_task_work() will run delayed work items, including fput() on all
> > files that were still open when the process exited.  This will cause
> > any
> > pending writes to be flushed for NFS.
> > 
> > So if a process writes to a file on a PNFS server, exits, and the MDS
> > tells the client to send the data to a DS which it hasn't established
> > a
> > connection with before, then it will crash in encode_exchange_id().
> > 
> > That order of calls in do_exit() is deliberate so we cannot swap them
> > - see
> > Commit: 8aac62706ada ("move exit_task_namespaces() outside of
> > exit_notify()")
> > 
> > The options that I can see are:
> > 1/ generate the implementation-id string at mount time and keep it
> >    around much like we do for cl_owner_id
> > 2/ Check current->nsproxy in encode_exchange_id() and skip the
> >    implementation id if ->nsproxy is not available.
> >    Note that there is no risk for a race with testing ->nxproxy.
> > 
> > Doesn't anyone have a strong opinion of which is best.  I'm inclined
> > to
> > go with '2', but mostly because it is less coding.
> > 
> 
> I'd be fine with ignoring the implementation id in that case. The
> protocol doesn't require it, so servers are expecte to be able to deal
> with that case.

Thanks for the quick reply.  I agree with you.
However it turns out that this isn't a problem in mainline.

When you close a file the ->flush() happens earlier in do_exit() and the
name spaces are still around.  However if you have a file mapped and
have dirty pages, then mainline doesn't force a flush on final unmap, or
exit of the process that had it mapped.

As I explained in
https://lore.kernel.org/all/150304037195.30218.15740287358704674869.stgit@noble/

I think this is wrong, so I have an fsync() call in nfs_file_release().
This *is* run after nsproxy has been cleared.

So mainline doesn't need this fix, but I do.

Thanks,
NeilBrown