On Tue, Sep 22, 2020 at 11:53 AM Anna Schumaker <anna.schumaker@xxxxxxxxxx> wrote: > > On Tue, Sep 22, 2020 at 11:49 AM Benjamin Coddington > <bcodding@xxxxxxxxxx> wrote: > > > > On 22 Sep 2020, at 10:43, Anna Schumaker wrote: > > > > > On Tue, Sep 22, 2020 at 10:31 AM Anna Schumaker > > > <anna.schumaker@xxxxxxxxxx> wrote: > > >> > > >> On Tue, Sep 22, 2020 at 10:22 AM Benjamin Coddington > > >> <bcodding@xxxxxxxxxx> wrote: > > >>> > > >>> On 22 Sep 2020, at 10:03, Anna Schumaker wrote: > > >>>> Hi Ben, > > >>>> > > >>>> Once I apply this patch I have trouble with generic/478 doing lock > > >>>> reclaim: > > >>>> > > >>>> [ 937.460505] run fstests generic/478 at 2020-09-22 09:59:14 > > >>>> [ 937.607990] NFS: __nfs4_reclaim_open_state: Lock reclaim failed! > > >>>> > > >>>> And the test just hangs until I kill it. > > >>>> > > >>>> Just thought you should know! > > >>> > > >>> Yes, thanks! I'm not seeing that.. I've tested these based on > > >>> v5.8.4, I'll > > >>> rebase and check again. I see a wirecap of generic/478 is only 515K > > >>> on my > > >>> system, would you be willing to share a capture of your test > > >>> failing? > > >> > > >> I have it based on v5.9-rc6 (plus the patches I have queued up for > > >> v5.10), so there definitely could be a difference there! I'm using a > > >> stock kernel on my server, though :) > > >> > > >> I can definitely get you a packet trace once I re-apply the patch and > > >> rerun the test. > > > > > > Here's the packet trace, I reran the test with just this patch applied > > > on top of v5.9-rc6 so it's not interacting with something else in my > > > tree. Looks like it's ending up in an NFS4ERR_OLD_STATEID loop. > > > > Thanks very much! > > > > Did you see this failure with all three patches applied, or just with > > the > > first patch? > > I saw it with the first patch applied, and with the first and third > applied. I initially hit it as I was wrapping up for the day > yesterday, but I left out #2 since I saw your retraction I reran with all three patches applied, and didn't have the issue. So something in the refactor patch fixes it. Anna > > > > > I see the client get two OPEN responses, but then is sending > > TEST_STATEID > > with the first seqid. Seems like seqid 2 is getting lost. I wonder if > > we're making a bad assumption that NFS_OPEN_STATE can only be toggled > > under > > the so_lock. > > > > Ben > >