Re: [PATCH] nfsd: fix delegation_blocked() to block correctly for at least 30 seconds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/10/2024 8:52 AM, Jeff Layton wrote:
On Tue, 2024-09-10 at 08:32 -0400, Tom Talpey wrote:
On 9/9/2024 11:02 AM, Olga Kornievskaia wrote:
On Mon, Sep 9, 2024 at 10:38 AM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:

On Mon, Sep 09, 2024 at 10:17:42AM -0400, Olga Kornievskaia wrote:
On Mon, Sep 9, 2024 at 8:24 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:

On Mon, 2024-09-09 at 15:06 +1000, NeilBrown wrote:
The pair of bloom filtered used by delegation_blocked() was intended to
block delegations on given filehandles for between 30 and 60 seconds.  A
new filehandle would be recorded in the "new" bit set.  That would then
be switch to the "old" bit set between 0 and 30 seconds later, and it
would remain as the "old" bit set for 30 seconds.


Since we're on the subject...

60s seems like an awfully long time to block delegations on an inode.
Recalls generally don't take more than a few seconds when things are
functioning properly.

Should we swap the bloom filters more often?

I was also thinking that perhaps we can do 15-30s perhaps? Another
thought was should this be a configurable value (with some
non-negotiable min and max)?

If it needs to be configurable, then we haven't done our jobs as
system architects. Temporarily blocking delegation ought to be
effective without human intervention IMHO.

At least let's get some specific usage scenarios that demonstrate a
palpable need for enabling an admin to adjust this behavior (ie, a
need that is not simply an implementation bug), then design for
those specific cases.

Does NFSD have metrics in this area, for example?

Generally speaking, though, I'm not opposed to finessing the behavior
of the Bloom filter. I'd like to apply the patch below for v6.12,

100% agreed that we need this patch to go in now. The configuration
was just a thought for after which I should have stated explicitly. I
guess I'm not a big fan of hard coded numbers in the code and naively
thinking that it's always better to have a config over a hardcoded
value.

No constant is ever correct in networking, especially timeouts. So yes,
it should be adjustable. But even then, choosing a number here is
fundamentally difficult.

Delegations can block for perfectly valid long periods, right? Say it
takes a long time to flush a write delegation, or if the network is
partitioned to the (other) client being recalled. 30 seconds to data
corruption is quite the guillotine.


I don't think this is danger of data corruption here. The bloom filter
is there to keep the server from handing out a new delegation too
quickly after having to recall one. Allowing no delegations for 30-60s
seems a bit too cautious, IMO.

Agreed that the server is doing the right thing, it's about when the
hammer comes down, and again, any constant is inevitably going to be
wrong, sometime or somewhere.

Ideally it seems like we'd want this to be some function of the delay
between the server issuing the CB_RECALL, and the client doing the
DELEGRETURN.

That value is obviously highly variable, but it would be an interesting
statistic to collect, and might help inform what the bloom filter delay
should be.

It would, but it seems it would have to be per-client and per-workload.
There is definitely no single value. Some clients may take a caching
read delegation "just because", and return it promptly. Others may
be at the end of a skinny/flaky link and want to buffer writes. These
are going to have wildly different delegreturn latencies.

Tom.


preserving the Cc: stable, but handle the behavioral finessing via
a subsequent patch targeting v6.13 so that can be appropriately
reviewed and tested. Ja?

BTW, nice catch!


Unfortunately the code intended to clear the old bit set once it reached
30 seconds old, preparing it to be the next new bit set, instead cleared
the *new* bit set before switching it to be the old bit set.  This means
that the "old" bit set is always empty and delegations are blocked
between 0 and 30 seconds.

This patch updates bd->new before clearing the set with that index,
instead of afterwards.

Reported-by: Olga Kornievskaia <okorniev@xxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Fixes: 6282cd565553 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.")
Signed-off-by: NeilBrown <neilb@xxxxxxx>
---
   fs/nfsd/nfs4state.c | 5 +++--
   1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 4313addbe756..6f18c1a7af2e 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1078,7 +1078,8 @@ static void nfs4_free_deleg(struct nfs4_stid *stid)
    * When a delegation is recalled, the filehandle is stored in the "new"
    * filter.
    * Every 30 seconds we swap the filters and clear the "new" one,
- * unless both are empty of course.
+ * unless both are empty of course.  This results in delegations for a
+ * given filehandle being blocked for between 30 and 60 seconds.
    *
    * Each filter is 256 bits.  We hash the filehandle to 32bit and use the
    * low 3 bytes as hash-table indices.
@@ -1107,9 +1108,9 @@ static int delegation_blocked(struct knfsd_fh *fh)
                if (ktime_get_seconds() - bd->swap_time > 30) {
                        bd->entries -= bd->old_entries;
                        bd->old_entries = bd->entries;
+                     bd->new = 1-bd->new;
                        memset(bd->set[bd->new], 0,
                               sizeof(bd->set[0]));
-                     bd->new = 1-bd->new;
                        bd->swap_time = ktime_get_seconds();
                }
                spin_unlock(&blocked_delegations_lock);

--
Jeff Layton <jlayton@xxxxxxxxxx>



--
Chuck Lever










[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux