On Mon, 2018-11-12 at 10:16 -0800, Chuck Lever wrote: > > On Nov 12, 2018, at 9:59 AM, Trond Myklebust < > > trondmy@xxxxxxxxxxxxxxx> wrote: > > > > On Sat, 2018-11-10 at 16:49 -0500, Bruce Fields wrote: > > > Looks like it's the fault of > > > > > > 07d02a67b7faae "SUNRPC: Simplify lookup code" > > > > I'm having trouble reproducing this bug. I've tried both cthon and > > xfstests in a loop, so far without success (both NFSv3 and v4.1, > > but > > only sec=sys). Is there anything else you're doing that I might > > try? > > > > e.g. Are you running multiple workloads in parallel? Different > > users?.. > > Some observations, for what they are worth: > > Single user test running with no other NFS workload. > > I see the BUG fire at umount time, not during the test. > > My client is a two-node NUMA system with 12 cores, which > could be more likely to trigger races. > > Export is tmpfs. > Thanks! That's useful info. Particularly the observation that you're seeing it at umount time... > > > > --b. > > > > > > On Fri, Nov 09, 2018 at 01:01:30PM -0500, Chuck Lever wrote: > > > > > On Nov 8, 2018, at 4:44 PM, J. Bruce Fields < > > > > > bfields@xxxxxxxxxxxx > > > > > > wrote: > > > > > > > > > > Since -rc1 my regression tests crash my client. Is this a > > > > > known > > > > > problem? I'll investigate some more, I haven't even looked > > > > > at > > > > > the code > > > > > yet or checked which test exactly is hitting this. > > > > > > > > > > --b. > > > > > > > > > > [ 164.109570] BUG: unable to handle kernel NULL pointer > > > > > dereference at 0000000000000008 > > > > > [ 164.111207] PGD 0 P4D 0 > > > > > [ 164.111528] Oops: 0000 [#1] PREEMPT SMP PTI > > > > > [ 164.112303] CPU: 2 PID: 2947 Comm: kworker/u8:5 Not > > > > > tainted > > > > > 4.20.0-rc1-13223-gafb6d1c474ef #1898 > > > > > [ 164.113487] Hardware name: QEMU Standard PC (i440FX + > > > > > PIIX, > > > > > 1996), BIOS ?-20180531_142017-buildhw- > > > > > 08.phx2.fedoraproject.org- > > > > > 1.fc28 04/01/2014 > > > > > [ 164.115301] Workqueue: rpciod rpc_async_schedule [sunrpc] > > > > > [ 164.115920] RIP: 0010:rpcauth_lookup_credcache+0x3d/0x450 > > > > > [sunrpc] > > > > > [ 164.116700] Code: 89 f5 41 54 41 89 d4 53 48 83 ec 38 89 > > > > > 4d b0 > > > > > 4c 8b 7f 20 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 > > > > > 8d 45 > > > > > c0 48 89 45 c8 <41> 8b 77 08 48 89 45 c0 48 8b 47 10 4c 89 ef > > > > > 48 > > > > > 8b 40 28 e8 cb d2 > > > > > [ 164.119299] RSP: 0018:ffffc90001ee3cf0 EFLAGS: 00010246 > > > > > [ 164.119872] RAX: ffffc90001ee3d10 RBX: ffff88007cc18180 > > > > > RCX: > > > > > 0000000000600040 > > > > > [ 164.120800] RDX: 0000000000000001 RSI: ffffc90001ee3d60 > > > > > RDI: > > > > > ffff88007cafb198 > > > > > [ 164.121643] RBP: ffffc90001ee3d50 R08: 0000000000000000 > > > > > R09: > > > > > 0000000000000000 > > > > > [ 164.122464] R10: 0000000000000000 R11: 0000000000000000 > > > > > R12: > > > > > 0000000000000001 > > > > > [ 164.123373] R13: ffffc90001ee3d60 R14: ffff88007cafb198 > > > > > R15: > > > > > 0000000000000000 > > > > > [ 164.124296] FS: 0000000000000000(0000) > > > > > GS:ffff88007fd00000(0000) knlGS:0000000000000000 > > > > > [ 164.125322] CS: 0010 DS: 0000 ES: 0000 CR0: > > > > > 0000000080050033 > > > > > [ 164.126006] CR2: 0000000000000008 CR3: 000000007829c003 > > > > > CR4: > > > > > 00000000001606e0 > > > > > [ 164.126860] Call Trace: > > > > > [ 164.127045] ? call_retry_reserve+0x30/0x30 [sunrpc] > > > > > [ 164.127622] rpcauth_lookupcred+0xa0/0xc0 [sunrpc] > > > > > [ 164.128200] rpcauth_refreshcred+0x15f/0x170 [sunrpc] > > > > > [ 164.128807] __rpc_execute+0xa9/0x460 [sunrpc] > > > > > [ 164.129281] process_one_work+0x227/0x630 > > > > > [ 164.129684] worker_thread+0x3c/0x390 > > > > > [ 164.130062] ? process_one_work+0x630/0x630 > > > > > [ 164.130609] kthread+0x11d/0x140 > > > > > [ 164.130936] ? kthread_park+0x80/0x80 > > > > > [ 164.131339] ret_from_fork+0x3a/0x50 > > > > > [ 164.131676] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs > > > > > lockd > > > > > grace auth_rpcgss sunrpc > > > > > [ 164.132719] CR2: 0000000000000008 > > > > > [ 164.133050] ---[ end trace b4028a6781a696ad ]--- > > > > > > > > > > > > > I just encountered this repeatedly with cthon04 general tests. > > > > > > > > MNTOPTIONS="rw,proto=tcp,vers=4.1,sec=sys" > > > > > > > > > > > > -- > > > > Chuck Lever > > > > chucklever@xxxxxxxxx > > > > > > > > > > -- > > Trond Myklebust > > CTO, Hammerspace Inc > > 4300 El Camino Real, Suite 105 > > Los Altos, CA 94022 > > www.hammer.space > > -- > Chuck Lever > chucklever@xxxxxxxxx > > > -- Trond Myklebust CTO, Hammerspace Inc 4300 El Camino Real, Suite 105 Los Altos, CA 94022 www.hammer.space