On 15 Sep 2024, at 21:17, NeilBrown wrote: > On Thu, 12 Sep 2024, Olga Kornievskaia wrote: > >> I wouldn't discount these operations (at least not rename) from being >> an operation that can't represent "sharing" of files. An example >> workload is where a file gets generated, created, written/read over >> the NFS, but then locally then transferred to another filesystem. I >> can imagine a pipeline, where then file gets filled up and the >> generated data moved to be worked on elsewhere and the same file gets >> filled up again. I think this bug was discovered because of an setup >> where there was a heavy use of these operations (on various files) and >> some got blocked causing problems. For such workload, if we are not >> going to block giving out a delegation do we cause too many >> cb_recalls? > > A pipeline as you describe seem to be a case of serial sharing. > Different applications use the same file, but only at different times. > This sort of sharing isn't hurt by delegations. > > The sort of sharing the might trigger excessive cb_recalls if > delegations weren't blocked would almost certainly involve file locking > and an expectation that two separate applications would sometimes access > the file concurrently. When this is happening, neither should get a > delegation. > > The problem you saw was really caused by a delegation being given out > while the rename was still happening. > i.e.: > - the rename starts > - the delegation is detected and broken > - the cb_recall is sent. > - the client opens the file prior to returning the delegation > - the client gets a new delegation as part of this open > - the client returns the original delegation > - the rename loops around and finds a new delegation which it needs > to break. > > The should only loop once unless the recall takes more than 30 seconds. > So I'm a bit perplexed that it blocked lock enough to be noticed. So > maybe there is more going on here than I can see. Or maybe the recall > is really slow. When the server's local rename process calls __break_lease(), it only calls fl_lmpops->lm_break() once and sets FL_UNLOCK_PENDING. After that it will sleep and wake to check, but never again call ->lm_break() (which will cause knfsd to recall the delegation). The check for leases_conflict() is not stateful. Ben