Re: [PATCH v2 0/9] Various NFSv4 state error handling fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Trond,

On Thu, Sep 19, 2019 at 7:42 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>
> Hi Olga
>
> On Thu, 2019-09-19 at 09:14 -0400, Olga Kornievskaia wrote:
> > Hi Trond,
> >
> > On Wed, Sep 18, 2019 at 9:49 PM Trond Myklebust <
> > trondmy@xxxxxxxxxxxxxxx> wrote:
> > > Hi Olga
> > >
> > > On Wed, 2019-09-18 at 15:38 -0400, Olga Kornievskaia wrote:
> > > > Hi Trond,
> > > >
> > > > These set of patches do not address the locking problem. It's
> > > > actually
> > > > not the locking patch (which I thought it was as I reverted it
> > > > and
> > > > still had the issue). Without the whole patch series the unlock
> > > > works
> > > > fine so something in these new patches. Something is up with the
> > > > 2
> > > > patches:
> > > > NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> > > > NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
> > > >
> > > > If I remove either one separately, unlock fails but if I remove
> > > > both
> > > > unlock works.
> > >
> > > Can you describe how you are testing this, and perhaps provide
> > > wireshark traces that show how we're triggering these problems?
> >
> > I'm triggering by running "nfstest_lock --nfsversion 4.1 --runtest
> > btest01" against either linux or ontap servers (while the test
> > doesn't
> > fail but on the network trace you can see unlock failing with
> > bad_stateid). Network trace attached.
> >
> > But actually a simple test open, lock, unlock does the trick (network
> > trace attached).
> > fd1 = open(RDWR)
> > fctl(fd1) (lock /unlock)
>
>
> These traces really do not mesh with what I'm seeing using a simple
> Connectathon lock test run. When I look at the wireshark output from
> that, I see exadtly two cases where the stateid arguments are both
> zero, and those are both SETATTR, so expected.
>
> All the LOCKU are showing up as non-zero stateids, and so I'm seeing no
> BAD_STATEID or OLD_STATEID errors at all.
>
> Is there something special about how your test is running?

There is nothing special that I can think of about my setup or how
test run. I pull from your testing branch, build it (no extra
patches). Run tests over 4.1 (default mount opts) against a linux
server (typically same kernel).

Is this patch series somewhere in your git branches? I've been testing
your testing branch (as I could see v2 changes were in the testing
branch). It's not obvious to me what was changed in v3 to see if the
testing branch has the right code.

>
> Cheers
>   Trond
>
> PS: Note: I do think I need a v3 of the LOCKU patch in order to add a
> spinlock around the new copy of the lock stateid in
> nfs4_alloc_unlockdata(). However I don't see how the missing spinlocks
> could cause you to consistently be seeing an all-zero stateid argument.

I'm not testing with v3 so I don't think that's it.

>
>
> >
> > > Thanks!
> > >   Trond
> > >
> > > > On Mon, Sep 16, 2019 at 4:46 PM Trond Myklebust <
> > > > trondmy@xxxxxxxxx>
> > > > wrote:
> > > > > Various NFSv4 fixes to ensure we handle state errors correctly.
> > > > > In
> > > > > particular, we need to ensure that for COMPOUNDs like CLOSE and
> > > > > DELEGRETURN, that may have an embedded LAYOUTRETURN, we handle
> > > > > the
> > > > > layout state errors so that a retry of either the LAYOUTRETURN,
> > > > > or
> > > > > the later CLOSE/DELEGRETURN does not corrupt the LAYOUTRETURN
> > > > > reply.
> > > > >
> > > > > Also ensure that if we get a NFS4ERR_OLD_STATEID, then we do
> > > > > our
> > > > > best to still try to destroy the state on the server, in order
> > > > > to
> > > > > avoid causing state leakage.
> > > > >
> > > > > v2: Fix bug reports from Olga
> > > > >  - Try to avoid sending old stateids on CLOSE/OPEN_DOWNGRADE
> > > > > when
> > > > >    doing fully serialised NFSv4.0.
> > > > >  - Ensure LOCKU initialises the stateid correctly.
> > > > >
> > > > > Trond Myklebust (9):
> > > > >   pNFS: Ensure we do clear the return-on-close layout stateid
> > > > > on
> > > > > fatal
> > > > >     errors
> > > > >   NFSv4: Clean up pNFS return-on-close error handling
> > > > >   NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
> > > > >   NFSv4: Handle RPC level errors in LAYOUTRETURN
> > > > >   NFSv4: Add a helper to increment stateid seqids
> > > > >   pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping
> > > > > the
> > > > > state
> > > > >     seqid
> > > > >   NFSv4: Fix OPEN_DOWNGRADE error handling
> > > > >   NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> > > > >   NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
> > > > >
> > > > >  fs/nfs/nfs4_fs.h   |  11 ++-
> > > > >  fs/nfs/nfs4proc.c  | 204 ++++++++++++++++++++++++++++++-------
> > > > > ----
> > > > > ----
> > > > >  fs/nfs/nfs4state.c |  16 ----
> > > > >  fs/nfs/pnfs.c      |  71 ++++++++++++++--
> > > > >  fs/nfs/pnfs.h      |  17 +++-
> > > > >  5 files changed, 229 insertions(+), 90 deletions(-)
> > > > >
> > > > > --
> > > > > 2.21.0
> > > > >
> > > --
> > > Trond Myklebust
> > > Linux NFS client maintainer, Hammerspace
> > > trond.myklebust@xxxxxxxxxxxxxxx
> > >
> > >
> Trond Myklebust
> CTO, Hammerspace Inc
> 4300 El Camino Real, Suite 105
> Los Altos, CA 94022
> www.hammer.space
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx
>
>



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux