On Sun, 2023-03-12 at 17:33 +0200, Amir Goldstein wrote: > On Fri, Mar 3, 2023 at 4:54 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > > > > > > > > > On Mar 3, 2023, at 7:15 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > I sent the first patch in this series the other day, but didn't get any > > > responses. > > > > We'll have to work out who will take which patches in this set. > > Once fully reviewed, I can take the set if the client maintainers > > send Acks for 2-4 and 6-7. > > > > nfsd-next for v6.4 is not yet open. I can work on setting that up > > today. > > > > > > > Since then I've had time to follow up on the client-side part > > > of this problem, which eventually also pointed out yet another bug on > > > the server side. There are also a couple of cleanup patches in here too, > > > and a patch to add some tracepoints that I found useful while diagnosing > > > this. > > > > > > With this set on both client and server, I'm now able to run Yongcheng's > > > test for an hour straight with no stuck locks. > > My nfstest_lock test occasionally gets into an endless wait loop for the lock in > one of the optests. > > AFAIK, this started happening after I upgraded my client machine to v5.15.88. > Does this seem related to the client bug fixes in this patch set? > > If so, is this bug a regression? and why are the fixes aimed for v6.4? > Most of this (lockd) code hasn't changed in well over a decade, so if this is a regression then it's a very old one. I suppose it's possible that this regressed after the BKL was removed from this code, but that was a long time ago now and I'm not sure I can identify a commit that this fixes. I'm fine with this going in sooner than v6.4, but given that this has been broken so long, I didn't see the need to rush. Cheers, -- Jeff Layton <jlayton@xxxxxxxxxx>