Re: [PATCH] nfsd: fix delegation_blocked() to block correctly for at least 30 seconds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15 Sep 2024, at 21:17, NeilBrown wrote:

> On Thu, 12 Sep 2024, Olga Kornievskaia wrote:
>
>> I wouldn't discount these operations (at least not rename) from being
>> an operation that can't represent "sharing" of files. An example
>> workload is where a file gets generated, created, written/read over
>> the NFS, but then locally then transferred to another filesystem. I
>> can imagine a pipeline, where then file gets filled up and the
>> generated data moved to be worked on elsewhere and the same file gets
>> filled up again. I think this bug was discovered because of an setup
>> where there was a heavy use of these operations (on various files) and
>> some got blocked causing problems. For such workload, if we are not
>> going to block giving out a delegation do we cause too many
>> cb_recalls?
>
> A pipeline as you describe seem to be a case of serial sharing.
> Different applications use the same file, but only at different times.
> This sort of sharing isn't hurt by delegations.
>
> The sort of sharing the might trigger excessive cb_recalls if
> delegations weren't blocked would almost certainly involve file locking
> and an expectation that two separate applications would sometimes access
> the file concurrently.  When this is happening, neither should get a
> delegation.
>
> The problem you saw was really caused by a delegation being given out
> while the rename was still happening.
> i.e.:
>   - the rename starts
>   - the delegation is detected and broken
>   - the cb_recall is sent.
>   - the client opens the file prior to returning the delegation
>   - the client gets a new delegation as part of this open
>   - the client returns the original delegation
>   - the rename loops around and finds a new delegation which it needs
>     to break.
>
> The should only loop once unless the recall takes more than 30 seconds.
> So I'm a bit perplexed that it blocked lock enough to be noticed.  So
> maybe there is more going on here than I can see.  Or maybe the recall
> is really slow.

When the server's local rename process calls __break_lease(), it only calls
fl_lmpops->lm_break() once and sets FL_UNLOCK_PENDING.  After that it will
sleep and wake to check, but never again call ->lm_break() (which will cause
knfsd to recall the delegation).

The check for leases_conflict() is not stateful.

Ben





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux