Re: [PATCH v2 0/9] Various NFSv4 state error handling fixes

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Fri, 20 Sep 2019 14:54:27 +0000

On Fri, 2019-09-20 at 10:25 -0400, Olga Kornievskaia wrote:
> Hi Trond,
> 
> On Thu, Sep 19, 2019 at 7:42 PM Trond Myklebust <
> trondmy@xxxxxxxxxxxxxxx> wrote:
> > Hi Olga
> > 
> > On Thu, 2019-09-19 at 09:14 -0400, Olga Kornievskaia wrote:
> > > Hi Trond,
> > > 
> > > On Wed, Sep 18, 2019 at 9:49 PM Trond Myklebust <
> > > trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > Hi Olga
> > > > 
> > > > On Wed, 2019-09-18 at 15:38 -0400, Olga Kornievskaia wrote:
> > > > > Hi Trond,
> > > > > 
> > > > > These set of patches do not address the locking problem. It's
> > > > > actually
> > > > > not the locking patch (which I thought it was as I reverted
> > > > > it
> > > > > and
> > > > > still had the issue). Without the whole patch series the
> > > > > unlock
> > > > > works
> > > > > fine so something in these new patches. Something is up with
> > > > > the
> > > > > 2
> > > > > patches:
> > > > > NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> > > > > NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
> > > > > 
> > > > > If I remove either one separately, unlock fails but if I
> > > > > remove
> > > > > both
> > > > > unlock works.
> > > > 
> > > > Can you describe how you are testing this, and perhaps provide
> > > > wireshark traces that show how we're triggering these problems?
> > > 
> > > I'm triggering by running "nfstest_lock --nfsversion 4.1 --
> > > runtest
> > > btest01" against either linux or ontap servers (while the test
> > > doesn't
> > > fail but on the network trace you can see unlock failing with
> > > bad_stateid). Network trace attached.
> > > 
> > > But actually a simple test open, lock, unlock does the trick
> > > (network
> > > trace attached).
> > > fd1 = open(RDWR)
> > > fctl(fd1) (lock /unlock)
> > 
> > These traces really do not mesh with what I'm seeing using a simple
> > Connectathon lock test run. When I look at the wireshark output
> > from
> > that, I see exadtly two cases where the stateid arguments are both
> > zero, and those are both SETATTR, so expected.
> > 
> > All the LOCKU are showing up as non-zero stateids, and so I'm
> > seeing no
> > BAD_STATEID or OLD_STATEID errors at all.
> > 
> > Is there something special about how your test is running?
> 
> There is nothing special that I can think of about my setup or how
> test run. I pull from your testing branch, build it (no extra
> patches). Run tests over 4.1 (default mount opts) against a linux
> server (typically same kernel).
> 
> Is this patch series somewhere in your git branches? I've been
> testing
> your testing branch (as I could see v2 changes were in the testing
> branch). It's not obvious to me what was changed in v3 to see if the
> testing branch has the right code.

I hadn't yet updated the testing branch with the v3 code. Pushed out
now as a forced-update.

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx