> On Nov 12, 2018, at 9:59 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Sat, 2018-11-10 at 16:49 -0500, Bruce Fields wrote: >> Looks like it's the fault of >> >> 07d02a67b7faae "SUNRPC: Simplify lookup code" > > I'm having trouble reproducing this bug. I've tried both cthon and > xfstests in a loop, so far without success (both NFSv3 and v4.1, but > only sec=sys). Is there anything else you're doing that I might try? > > e.g. Are you running multiple workloads in parallel? Different users?.. Some observations, for what they are worth: Single user test running with no other NFS workload. I see the BUG fire at umount time, not during the test. My client is a two-node NUMA system with 12 cores, which could be more likely to trigger races. Export is tmpfs. >> --b. >> >> On Fri, Nov 09, 2018 at 01:01:30PM -0500, Chuck Lever wrote: >>> >>>> On Nov 8, 2018, at 4:44 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx >>>>> wrote: >>>> >>>> Since -rc1 my regression tests crash my client. Is this a known >>>> problem? I'll investigate some more, I haven't even looked at >>>> the code >>>> yet or checked which test exactly is hitting this. >>>> >>>> --b. >>>> >>>> [ 164.109570] BUG: unable to handle kernel NULL pointer >>>> dereference at 0000000000000008 >>>> [ 164.111207] PGD 0 P4D 0 >>>> [ 164.111528] Oops: 0000 [#1] PREEMPT SMP PTI >>>> [ 164.112303] CPU: 2 PID: 2947 Comm: kworker/u8:5 Not tainted >>>> 4.20.0-rc1-13223-gafb6d1c474ef #1898 >>>> [ 164.113487] Hardware name: QEMU Standard PC (i440FX + PIIX, >>>> 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org- >>>> 1.fc28 04/01/2014 >>>> [ 164.115301] Workqueue: rpciod rpc_async_schedule [sunrpc] >>>> [ 164.115920] RIP: 0010:rpcauth_lookup_credcache+0x3d/0x450 >>>> [sunrpc] >>>> [ 164.116700] Code: 89 f5 41 54 41 89 d4 53 48 83 ec 38 89 4d b0 >>>> 4c 8b 7f 20 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 8d 45 >>>> c0 48 89 45 c8 <41> 8b 77 08 48 89 45 c0 48 8b 47 10 4c 89 ef 48 >>>> 8b 40 28 e8 cb d2 >>>> [ 164.119299] RSP: 0018:ffffc90001ee3cf0 EFLAGS: 00010246 >>>> [ 164.119872] RAX: ffffc90001ee3d10 RBX: ffff88007cc18180 RCX: >>>> 0000000000600040 >>>> [ 164.120800] RDX: 0000000000000001 RSI: ffffc90001ee3d60 RDI: >>>> ffff88007cafb198 >>>> [ 164.121643] RBP: ffffc90001ee3d50 R08: 0000000000000000 R09: >>>> 0000000000000000 >>>> [ 164.122464] R10: 0000000000000000 R11: 0000000000000000 R12: >>>> 0000000000000001 >>>> [ 164.123373] R13: ffffc90001ee3d60 R14: ffff88007cafb198 R15: >>>> 0000000000000000 >>>> [ 164.124296] FS: 0000000000000000(0000) >>>> GS:ffff88007fd00000(0000) knlGS:0000000000000000 >>>> [ 164.125322] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 164.126006] CR2: 0000000000000008 CR3: 000000007829c003 CR4: >>>> 00000000001606e0 >>>> [ 164.126860] Call Trace: >>>> [ 164.127045] ? call_retry_reserve+0x30/0x30 [sunrpc] >>>> [ 164.127622] rpcauth_lookupcred+0xa0/0xc0 [sunrpc] >>>> [ 164.128200] rpcauth_refreshcred+0x15f/0x170 [sunrpc] >>>> [ 164.128807] __rpc_execute+0xa9/0x460 [sunrpc] >>>> [ 164.129281] process_one_work+0x227/0x630 >>>> [ 164.129684] worker_thread+0x3c/0x390 >>>> [ 164.130062] ? process_one_work+0x630/0x630 >>>> [ 164.130609] kthread+0x11d/0x140 >>>> [ 164.130936] ? kthread_park+0x80/0x80 >>>> [ 164.131339] ret_from_fork+0x3a/0x50 >>>> [ 164.131676] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs lockd >>>> grace auth_rpcgss sunrpc >>>> [ 164.132719] CR2: 0000000000000008 >>>> [ 164.133050] ---[ end trace b4028a6781a696ad ]--- >>>> >>> >>> I just encountered this repeatedly with cthon04 general tests. >>> >>> MNTOPTIONS="rw,proto=tcp,vers=4.1,sec=sys" >>> >>> >>> -- >>> Chuck Lever >>> chucklever@xxxxxxxxx >>> >>> > -- > Trond Myklebust > CTO, Hammerspace Inc > 4300 El Camino Real, Suite 105 > Los Altos, CA 94022 > www.hammer.space -- Chuck Lever chucklever@xxxxxxxxx