Re: [RFC-PATCH] nfsd: when unhashing openowners, increment openowner's refcount

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Wed, 28 Aug 2019 12:54:29 -0400

On Wed, Aug 28, 2019 at 06:20:22PM +0300, Alex Lyakas wrote:
> On Tue, Aug 27, 2019 at 11:51 PM J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> >
> > On Tue, Aug 27, 2019 at 12:05:28PM +0300, Alex Lyakas wrote:
> > > Is the described issue familiar to you?
> >
> > Yep, got it, but I haven't seen anyone try to solve it using the fault
> > injection code, that's interesting!
> >
> > There's also fs/nfsd/unlock_filesystem.  It only unlocks NLM (NFSv3)
> > locks.  But it'd probably be reasonable to teach it to get NFSv4 state
> > too (locks, opens, delegations, and layouts).
> >
> > But my feeling's always been that the cleanest way to do it is to create
> > two containers with separate net namespaces and run nfsd in both of
> > them.  You can start and stop the servers in the different containers
> > independently.
> 
> I am looking at the code, and currently nfsd creates a single
> namespace subsystem in init_nfsd. All nfs4_clients run in this
> subsystem.
> 
> So the proposal is to use register_pernet_subsys() for every
> filesystem that is exported?

No, I'm proposing any krenel changes.  Just create separate net
namespaces from userspace and start nfsd from within them.  And you'll
also need to arrange for them different nfsd's to get different exports.

In practice, the best way to do this may be using some container
management service, I'm not sure.

> I presume that current nfsd code cannot
> do this, and some rework is required to move away from a single
> subsystem to per-export subsystem. Also, grepping through kernel code,
> I see that namespace subsystems are created by different modules as
> part of module initialization, rather than doing that dynamically.
> Furthermore, in our case the same nfsd machine S can export tens or
> even hundreds of local filesystems.Is this fine to have hundreds of
> subsystems?

I haven't done it myself, but I suspect hundreds of containers should be
OK.  It may depend on available resources, of course.

> Otherwise, I understand that the current behavior is a "won't fix",
> and it is expected for the client machine to unmount the export before
> un-exporting the file system at nfsd machine. Is this correct?

You're definitely not the only ones to request this, so I'd like to have
a working solution.

My preference would be to try the namespace/container approach first.
And if that turns out no to work well for some reason, to update
fs/nfsd/unlock_filesystem to handle NFSv4 stuff.

The fault injection code isn't the right interface for this.  Even if we
did decide it was worth fixing up and maintaining--it's really only
designed for testing clients.  I'd expect distros not to build it in
their default kernels.

--b.