Re: [PATCH] nfsd: fix delegation_blocked() to block correctly for at least 30 seconds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2024-09-10 at 08:32 -0400, Tom Talpey wrote:
> On 9/9/2024 11:02 AM, Olga Kornievskaia wrote:
> > On Mon, Sep 9, 2024 at 10:38 AM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> > > 
> > > On Mon, Sep 09, 2024 at 10:17:42AM -0400, Olga Kornievskaia wrote:
> > > > On Mon, Sep 9, 2024 at 8:24 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > > > 
> > > > > On Mon, 2024-09-09 at 15:06 +1000, NeilBrown wrote:
> > > > > > The pair of bloom filtered used by delegation_blocked() was intended to
> > > > > > block delegations on given filehandles for between 30 and 60 seconds.  A
> > > > > > new filehandle would be recorded in the "new" bit set.  That would then
> > > > > > be switch to the "old" bit set between 0 and 30 seconds later, and it
> > > > > > would remain as the "old" bit set for 30 seconds.
> > > > > > 
> > > > > 
> > > > > Since we're on the subject...
> > > > > 
> > > > > 60s seems like an awfully long time to block delegations on an inode.
> > > > > Recalls generally don't take more than a few seconds when things are
> > > > > functioning properly.
> > > > > 
> > > > > Should we swap the bloom filters more often?
> > > > 
> > > > I was also thinking that perhaps we can do 15-30s perhaps? Another
> > > > thought was should this be a configurable value (with some
> > > > non-negotiable min and max)?
> > > 
> > > If it needs to be configurable, then we haven't done our jobs as
> > > system architects. Temporarily blocking delegation ought to be
> > > effective without human intervention IMHO.
> > > 
> > > At least let's get some specific usage scenarios that demonstrate a
> > > palpable need for enabling an admin to adjust this behavior (ie, a
> > > need that is not simply an implementation bug), then design for
> > > those specific cases.
> > > 
> > > Does NFSD have metrics in this area, for example?
> > > 
> > > Generally speaking, though, I'm not opposed to finessing the behavior
> > > of the Bloom filter. I'd like to apply the patch below for v6.12,
> > 
> > 100% agreed that we need this patch to go in now. The configuration
> > was just a thought for after which I should have stated explicitly. I
> > guess I'm not a big fan of hard coded numbers in the code and naively
> > thinking that it's always better to have a config over a hardcoded
> > value.
> 
> No constant is ever correct in networking, especially timeouts. So yes,
> it should be adjustable. But even then, choosing a number here is
> fundamentally difficult.
> 
> Delegations can block for perfectly valid long periods, right? Say it
> takes a long time to flush a write delegation, or if the network is
> partitioned to the (other) client being recalled. 30 seconds to data
> corruption is quite the guillotine.
> 

I don't think this is danger of data corruption here. The bloom filter
is there to keep the server from handing out a new delegation too
quickly after having to recall one. Allowing no delegations for 30-60s
seems a bit too cautious, IMO. 

Ideally it seems like we'd want this to be some function of the delay
between the server issuing the CB_RECALL, and the client doing the
DELEGRETURN.

That value is obviously highly variable, but it would be an interesting
statistic to collect, and might help inform what the bloom filter delay
should be.

> > > preserving the Cc: stable, but handle the behavioral finessing via
> > > a subsequent patch targeting v6.13 so that can be appropriately
> > > reviewed and tested. Ja?
> > > 
> > > BTW, nice catch!
> > > 
> > > 
> > > > > > Unfortunately the code intended to clear the old bit set once it reached
> > > > > > 30 seconds old, preparing it to be the next new bit set, instead cleared
> > > > > > the *new* bit set before switching it to be the old bit set.  This means
> > > > > > that the "old" bit set is always empty and delegations are blocked
> > > > > > between 0 and 30 seconds.
> > > > > > 
> > > > > > This patch updates bd->new before clearing the set with that index,
> > > > > > instead of afterwards.
> > > > > > 
> > > > > > Reported-by: Olga Kornievskaia <okorniev@xxxxxxxxxx>
> > > > > > Cc: stable@xxxxxxxxxxxxxxx
> > > > > > Fixes: 6282cd565553 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.")
> > > > > > Signed-off-by: NeilBrown <neilb@xxxxxxx>
> > > > > > ---
> > > > > >   fs/nfsd/nfs4state.c | 5 +++--
> > > > > >   1 file changed, 3 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > > index 4313addbe756..6f18c1a7af2e 100644
> > > > > > --- a/fs/nfsd/nfs4state.c
> > > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > > @@ -1078,7 +1078,8 @@ static void nfs4_free_deleg(struct nfs4_stid *stid)
> > > > > >    * When a delegation is recalled, the filehandle is stored in the "new"
> > > > > >    * filter.
> > > > > >    * Every 30 seconds we swap the filters and clear the "new" one,
> > > > > > - * unless both are empty of course.
> > > > > > + * unless both are empty of course.  This results in delegations for a
> > > > > > + * given filehandle being blocked for between 30 and 60 seconds.
> > > > > >    *
> > > > > >    * Each filter is 256 bits.  We hash the filehandle to 32bit and use the
> > > > > >    * low 3 bytes as hash-table indices.
> > > > > > @@ -1107,9 +1108,9 @@ static int delegation_blocked(struct knfsd_fh *fh)
> > > > > >                if (ktime_get_seconds() - bd->swap_time > 30) {
> > > > > >                        bd->entries -= bd->old_entries;
> > > > > >                        bd->old_entries = bd->entries;
> > > > > > +                     bd->new = 1-bd->new;
> > > > > >                        memset(bd->set[bd->new], 0,
> > > > > >                               sizeof(bd->set[0]));
> > > > > > -                     bd->new = 1-bd->new;
> > > > > >                        bd->swap_time = ktime_get_seconds();
> > > > > >                }
> > > > > >                spin_unlock(&blocked_delegations_lock);
> > > > > 
> > > > > --
> > > > > Jeff Layton <jlayton@xxxxxxxxxx>
> > > > > 
> > > > 
> > > 
> > > --
> > > Chuck Lever
> > > 
> > 
> > 
> 

-- 
Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux