Re: [PATCH 1/3] NFSv4: Fix a livelock when CLOSE pre-emptively bumps state sequence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 22, 2020 at 2:47 PM Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>
> On 22 Sep 2020, at 12:11, Anna Schumaker wrote:
>
> > On Tue, Sep 22, 2020 at 11:53 AM Anna Schumaker
> > <anna.schumaker@xxxxxxxxxx> wrote:
> >>
> >> On Tue, Sep 22, 2020 at 11:49 AM Benjamin Coddington
> >> <bcodding@xxxxxxxxxx> wrote:
> >>>
> >>> On 22 Sep 2020, at 10:43, Anna Schumaker wrote:
> >>>
> >>>> On Tue, Sep 22, 2020 at 10:31 AM Anna Schumaker
> >>>> <anna.schumaker@xxxxxxxxxx> wrote:
> >>>>>
> >>>>> On Tue, Sep 22, 2020 at 10:22 AM Benjamin Coddington
> >>>>> <bcodding@xxxxxxxxxx> wrote:
> >>>>>>
> >>>>>> On 22 Sep 2020, at 10:03, Anna Schumaker wrote:
> >>>>>>> Hi Ben,
> >>>>>>>
> >>>>>>> Once I apply this patch I have trouble with generic/478 doing lock
> >>>>>>> reclaim:
> >>>>>>>
> >>>>>>> [  937.460505] run fstests generic/478 at 2020-09-22 09:59:14
> >>>>>>> [  937.607990] NFS: __nfs4_reclaim_open_state: Lock reclaim failed!
> >>>>>>>
> >>>>>>> And the test just hangs until I kill it.
> >>>>>>>
> >>>>>>> Just thought you should know!
> >>>>>>
> >>>>>> Yes, thanks!  I'm not seeing that..  I've tested these based on
> >>>>>> v5.8.4, I'll
> >>>>>> rebase and check again.  I see a wirecap of generic/478 is only 515K
> >>>>>> on my
> >>>>>> system, would you be willing to share a capture of your test
> >>>>>> failing?
> >>>>>
> >>>>> I have it based on v5.9-rc6 (plus the patches I have queued up for
> >>>>> v5.10), so there definitely could be a difference there! I'm using a
> >>>>> stock kernel on my server, though :)
> >>>>>
> >>>>> I can definitely get you a packet trace once I re-apply the patch and
> >>>>> rerun the test.
> >>>>
> >>>> Here's the packet trace, I reran the test with just this patch applied
> >>>> on top of v5.9-rc6 so it's not interacting with something else in my
> >>>> tree. Looks like it's ending up in an NFS4ERR_OLD_STATEID loop.
> >>>
> >>> Thanks very much!
> >>>
> >>> Did you see this failure with all three patches applied, or just with
> >>> the
> >>> first patch?
> >>
> >> I saw it with the first patch applied, and with the first and third
> >> applied. I initially hit it as I was wrapping up for the day
> >> yesterday, but I left out #2 since I saw your retraction
> >
> > I reran with all three patches applied, and didn't have the issue. So
> > something in the refactor patch fixes it.
>
> That helped me see the case we're not handling correctly is when two OPENs
> race and the second one tries to update the state first and gets dropped.
> That is fixed by the 2/3 refactor patch since the refactor was being a bit
> more explicit.
>
> That means I'll need to fix those two patches and send them again.  I'm very
> glad you caught this!  Thanks very much for helping me find the problem.

You're welcome! I'm looking forward to the next version :)

Anna

>
> Ben
>



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux