> On Nov 12, 2018, at 4:08 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Mon, 2018-11-12 at 16:00 -0800, Chuck Lever wrote: >>> On Nov 12, 2018, at 3:57 PM, Trond Myklebust < >>> trondmy@xxxxxxxxxxxxxxx> wrote: >>> >>>>> On Mon, 2018-11-12 at 18:01 -0500, bfields@xxxxxxxxxxxx wrote: >>>>> On Mon, Nov 12, 2018 at 09:17:16PM +0000, Trond Myklebust >>>>> wrote: >>>>>> On Mon, 2018-11-12 at 13:24 -0500, bfields@xxxxxxxxxxxx >>>>>> wrote: >>>>>>> On Mon, Nov 12, 2018 at 05:59:33PM +0000, Trond Myklebust >>>>>>> wrote: >>>>>>>> On Sat, 2018-11-10 at 16:49 -0500, Bruce Fields wrote: >>>>>>>> Looks like it's the fault of >>>>>>>> >>>>>>>> 07d02a67b7faae "SUNRPC: Simplify lookup code" >>>>>>> >>>>>>> I'm having trouble reproducing this bug. I've tried both >>>>>>> cthon >>>>>>> and >>>>>>> xfstests in a loop, so far without success (both NFSv3 and >>>>>>> v4.1, >>>>>>> but >>>>>>> only sec=sys). Is there anything else you're doing that I >>>>>>> might >>>>>>> try? >>>>>>> >>>>>>> e.g. Are you running multiple workloads in parallel? >>>>>>> Different >>>>>>> users?.. >>>>>> >>>>>> Nothing that interesting. Currently it's connectathon over >>>>>> v4, >>>>>> v3, >>>>>> v4/krb5, v3/krb5, v4/krb5i, v4/krb5p, v4.1, v4.1/krb5, but >>>>>> just >>>>>> serially >>>>>> one after the other. Then some pynfs tests (which bypass the >>>>>> client), >>>>>> then xfstests over v4.2/sys. And also a few one-off locking >>>>>> tests of >>>>>> my >>>>>> own that probably aren't a factor here. >>>>>> >>>>>> (Hah, I just realized I was mounting with vers=4 and assuming >>>>>> that >>>>>> meant >>>>>> 4.0, but actually it's changed over time depending on the >>>>>> defaults, >>>>>> so >>>>>> currently those "v4" runs are actually all 4.2. Gah.) >>>>> >>>>> Are you perhaps both using RPCSEC_GSS w/ integrity checking for >>>>> your >>>>> EXCHANGE_ID authentication? The client will attempt to use that >>>>> by >>>>> default if rpc.gssd is running. >>>> >>>> Yes, in addition to the krb5i mount I'd expect the sys/krb5/krb5p >>>> mounts >>>> are using krb5i for EXCHANGE_ID. >>>> >>>>> I ask because I think the issue might be with RPCSEC_GSS, >>>>> specifically >>>>> with the RPCSEC_GSS context destroy code, hence the 2 patches >>>>> that >>>>> I >>>>> just sent out. >>>> >>>> Looks like my tests pass after applying those two patches. >>>> >>> >>> Cool! Thanks for testing. >>> >>> Chuck, do you think the above might also explain your sighting of >>> the >>> same Oops? >> >> Could be, I don’t think I saw it until I started testing NFSv4. >> I won’t be able to confirm that until next week. >> > > OK. Either way, I know that part of the GSS code needs to be fixed in > order to deal with the reference count being 0, so I think it is worth > merging this patch now, and then we can see if there is more to the > regression when you can get back to your test rig. Sounds fine to me. > Thanks > Trond > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx > >