On Tue, Sep 22, 2020 at 2:47 PM Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: > > On 22 Sep 2020, at 12:11, Anna Schumaker wrote: > > > On Tue, Sep 22, 2020 at 11:53 AM Anna Schumaker > > <anna.schumaker@xxxxxxxxxx> wrote: > >> > >> On Tue, Sep 22, 2020 at 11:49 AM Benjamin Coddington > >> <bcodding@xxxxxxxxxx> wrote: > >>> > >>> On 22 Sep 2020, at 10:43, Anna Schumaker wrote: > >>> > >>>> On Tue, Sep 22, 2020 at 10:31 AM Anna Schumaker > >>>> <anna.schumaker@xxxxxxxxxx> wrote: > >>>>> > >>>>> On Tue, Sep 22, 2020 at 10:22 AM Benjamin Coddington > >>>>> <bcodding@xxxxxxxxxx> wrote: > >>>>>> > >>>>>> On 22 Sep 2020, at 10:03, Anna Schumaker wrote: > >>>>>>> Hi Ben, > >>>>>>> > >>>>>>> Once I apply this patch I have trouble with generic/478 doing lock > >>>>>>> reclaim: > >>>>>>> > >>>>>>> [ 937.460505] run fstests generic/478 at 2020-09-22 09:59:14 > >>>>>>> [ 937.607990] NFS: __nfs4_reclaim_open_state: Lock reclaim failed! > >>>>>>> > >>>>>>> And the test just hangs until I kill it. > >>>>>>> > >>>>>>> Just thought you should know! > >>>>>> > >>>>>> Yes, thanks! I'm not seeing that.. I've tested these based on > >>>>>> v5.8.4, I'll > >>>>>> rebase and check again. I see a wirecap of generic/478 is only 515K > >>>>>> on my > >>>>>> system, would you be willing to share a capture of your test > >>>>>> failing? > >>>>> > >>>>> I have it based on v5.9-rc6 (plus the patches I have queued up for > >>>>> v5.10), so there definitely could be a difference there! I'm using a > >>>>> stock kernel on my server, though :) > >>>>> > >>>>> I can definitely get you a packet trace once I re-apply the patch and > >>>>> rerun the test. > >>>> > >>>> Here's the packet trace, I reran the test with just this patch applied > >>>> on top of v5.9-rc6 so it's not interacting with something else in my > >>>> tree. Looks like it's ending up in an NFS4ERR_OLD_STATEID loop. > >>> > >>> Thanks very much! > >>> > >>> Did you see this failure with all three patches applied, or just with > >>> the > >>> first patch? > >> > >> I saw it with the first patch applied, and with the first and third > >> applied. I initially hit it as I was wrapping up for the day > >> yesterday, but I left out #2 since I saw your retraction > > > > I reran with all three patches applied, and didn't have the issue. So > > something in the refactor patch fixes it. > > That helped me see the case we're not handling correctly is when two OPENs > race and the second one tries to update the state first and gets dropped. > That is fixed by the 2/3 refactor patch since the refactor was being a bit > more explicit. > > That means I'll need to fix those two patches and send them again. I'm very > glad you caught this! Thanks very much for helping me find the problem. You're welcome! I'm looking forward to the next version :) Anna > > Ben >