Re: [PATCH RFC 0/2] nfsd: add principal to the data being tracked by nfsdcld

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Wed, 31 Jul 2019 10:31:56 -0400

On Wed, Jul 31, 2019 at 08:07:53AM -0400, Scott Mayhew wrote:
> On Tue, 30 Jul 2019, J. Bruce Fields wrote:
> 
> > On Tue, Jul 30, 2019 at 05:08:45PM -0400, Scott Mayhew wrote:
> > > At the spring bakeathon, Chuck suggested that we should store the
> > > kerberos principal in addition to the client id string in nfsdcld.  The
> > > idea is to prevent an illegitimate client from reclaiming another
> > > client's opens by supplying that client's id string.  This is an initial
> > > attempt at doing that.
> > 
> > Great to see this, thanks!
> > 
> > > Initially I had played with the idea of hanging a version number off
> > > either the cld_net or nfsd_net structure, exposing that via a proc file,
> > > and having the userspace daemon write the desired version number to the
> > > proc file prior to opening the rpc_pipefs file.  That was potentially
> > > problematic if nfsd was already trying to queue an upcall using a
> > > version that nfscld didn't support... so I like the GetVersion upcall
> > > idea better.
> > 
> > Sounds OK to me.
> > 
> > It queries the version once on nfsd startup and sticks with that
> > version.  We allow rpc.mountd to be restarted while nfsd is running.  So
> > the one case I think it wouldn't handle is the case where somebody
> > downgrades mountd while nfsd is running.
> 
> I'm assuming you meant nfsdcld rather than mountd here...

Oops, right.

> currently if
> someone were to downgrade nfsdcld while nfsd is running then that case
> wouldn't be handled, so a restart of nfsd would also be necessary.

Right.

> > My feeling is that handling that case would be way too much trouble, so
> > actually I'm going to consider that a plus.  But it might be worth
> > documenting (if only in a patch changelog).
>
> To handle that scenario, We'd need to change the database schema.  I'd
> really rather do that in an external script than stick downgrade logic
> into nfsdcld... mainly because I'd want to check if there was any data
> in the columns being eliminated and warn the user & ask for
> confirmation.  
> 
> We'd also need to change how nfsdcld reacts when it gets a message
> version it doesn't support.  Currently it just reopens the pipe file,
> which causes the upcall to fail with -EAGAIN, which causes nfsd to retry
> the upcall, and it just goes into a loop.  I could implement a "version
> not supported" downcall.  I'm not sure what error number to use... maybe
> -EPROTONOSUPP?  Also even if we implemented a "version not supported"
> downcall now, that wouldn't handle the problem with existing nfsdcld's.
> The only thing I can think of there is to add a counter to
> cld_pipe_upcall() and exit the loop after a certain number of iterations
> (10? 100? 1000?).

This sounds too complicated.  Let's just add a note that downgrades
require an nfsd restart.

> > > The second patch actually adds the v2 message, which adds the principal
> > > (actually just the first 1024 bytes) to the Cld_Create upcall and to the 
> > > Cld_GraceStart downcall (which is what loads the data in the
> > > reclaim_str_hashtbl). I couldn't really figure out what the maximum length
> > > of a kerberos principal actually is (looking at krb5.h the length field in
> > > the struct krb5_data is an unsigned int, so I guess it can be pretty big).
> > > I don't think the principal will be that large in practice, and even if
> > > it is the first 1024 bytes should be sufficient for our purposes.
> > 
> > How does it fail when principals are longer?  Does it error out, or
> > treat two principals as equal if they agree in the first 1024 bytes?
> 
> It treats them as equal.

Got it, thanks.

--b.