> On May 8, 2019, at 3:10 PM, Anna Schumaker <schumaker.anna@xxxxxxxxx> wrote: > > On Wed, 2019-05-08 at 14:57 -0400, Chuck Lever wrote: >>> On May 8, 2019, at 2:37 PM, Anna Schumaker <schumaker.anna@xxxxxxxxx> wrote: >>> >>> Hi Chuck, >>> >>> What NFS version are you using for testing? And does it happen consistently >>> on >>> the same xfstest? >> >> NFSv3 with SCRATCH_DEV mounted with default (NFSv4.1). I have a Kerberos >> keytab so even though both mounts are sec=sys, the NFSv4.1 mount uses >> krb5i for lease management. > > I usually use the same version for tests and scratch, so that might be why I > haven't seen this yet. I'll try your setup and see if I can hit anything. > >> >> The gss_auth BUG pops usually during unmount, but for the last few >> times it was popping after every test finished. >> >> The __alloc_pages_nodemask warning happens once per boot, right when >> the tests start. I think it's a WARN_ON_ONCE ? >> >> The BUGs at the bottom of this e-mail happened when I tried to umount >> and clean up (I ^C'd at generic/052). >> >> I'm trying a bisect over linux-next right now, for grins. > > Let me know what you find! ac83228a7101e655ba5a7fa61ae10b058ada15db is the first bad commit commit ac83228a7101e655ba5a7fa61ae10b058ada15db Author: Trond Myklebust <trondmy@xxxxxxxxx> Date: Wed Apr 24 17:46:45 2019 -0400 SUNRPC: Use namespace of listening daemon in the client AUTH_GSS upcall When the client needs to talk to rpc.gssd, we should ensure that the uid argument is encoded to match the user namespace of the daemon. Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> Signed-off-by: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> :040000 040000 b7690384eb71926eca4a471986228ce9262fe702 b31440d1a4e6ad51d4f41db84a45b5c4f62e7495 M net I haven't looked closely at this commit... time for dinner. > Anna > >> >> >>> Thanks, >>> Anna >>> >>> On Wed, 2019-05-08 at 14:30 -0400, Chuck Lever wrote: >>>> Hit a few oddities, so I enabled memory debugging. >>>> >>>> This happens over and over during xfstests: >>>> >>>> May 8 11:19:42 manet kernel: >>>> ========================================================================== >>>> === >>>> May 8 11:19:42 manet kernel: BUG kmalloc-128 (Tainted: >>>> G B ): >>>> Poison overwritten >>>> May 8 11:19:42 manet kernel: -------------------------------------------- >>>> ---- >>>> ----------------------------- >>>> May 8 11:19:42 manet kernel: INFO: 0x000000004ca7d9fd-0x000000003fa83627. >>>> First byte 0x6a instead of 0x6b >>>> May 8 11:19:42 manet kernel: INFO: Allocated in gss_create+0x8b/0x32f >>>> [auth_rpcgss] age=9633 cpu=11 pid=21120 >>>> >>>> This is the first field in struct gss_auth, which is a kref object. >>>> >>>> I also see this: >>>> >>>> May 8 14:21:16 manet kernel: WARNING: CPU: 10 PID: 45 at >>>> /home/cel/src/linux/anna/mm/page_alloc.c:4584 >>>> __alloc_pages_nodemask+0x3f/0x2c2 >>>> May 8 14:21:16 manet kernel: Modules linked in: cts rpcsec_gss_krb5 >>>> ib_umad >>>> ib_ipoib mlx4_ib dm_mirror dm_region_hash dm_log dm_mod dax sb_edac >>>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iTCO_wdt >>>> iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul >>>> ghash_clmulni_intel rpcrdma aesni_intel rdma_ucm crypto_simd cryptd >>>> ib_iser >>>> glue_helper rdma_cm iw_cm pcspkr ib_cm libiscsi scsi_transport_iscsi >>>> lpc_ich >>>> i2c_i801 mfd_core mei_me sg mei ioatdma wmi ipmi_si ipmi_devintf >>>> ipmi_msghandler acpi_power_meter acpi_pad pcc_cpufreq auth_rpcgss >>>> ip_tables >>>> xfs libcrc32c mlx4_en sr_mod cdrom qedr sd_mod ast drm_kms_helper >>>> syscopyarea >>>> sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core crc32c_intel qede igb >>>> ahci >>>> libahci qed libata dca i2c_algo_bit i2c_core crc8 ib_uverbs ib_core nfsv4 >>>> dns_resolver nfsv3 nfs_acl nfs lockd grace sunrpc fscache >>>> May 8 14:21:16 manet kernel: CPU: 10 PID: 45 Comm: kworker/u26:0 Not >>>> tainted >>>> 5.1.0-rc6-00066-g3be130e #286 >>>> May 8 14:21:16 manet kernel: Hardware name: Supermicro SYS-6028R- >>>> T/X10DRi, >>>> BIOS 1.1a 10/16/2015 >>>> May 8 14:21:16 manet kernel: Workqueue: rpciod rpc_async_schedule >>>> [sunrpc] >>>> May 8 14:21:16 manet kernel: RIP: 0010:__alloc_pages_nodemask+0x3f/0x2c2 >>>> May 8 14:21:16 manet kernel: Code: f4 55 53 89 fb 48 83 ec 30 65 48 8b 04 >>>> 25 >>>> 28 00 00 00 48 89 44 24 28 31 c0 48 89 e7 83 fe 0a f3 ab 76 10 80 e7 20 74 >>>> 02 >>>> eb 02 <0f> 0b 31 c0 e9 38 02 00 00 23 1d e5 9b 0d 01 b8 22 01 32 01 48 63 >>>> May 8 14:21:16 manet kernel: RSP: 0018:ffffc9000332fc98 EFLAGS: 00010246 >>>> May 8 14:21:16 manet kernel: RAX: 0000000000000000 RBX: 0000000000040040 >>>> RCX: >>>> 0000000000000000 >>>> May 8 14:21:16 manet kernel: RDX: 0000000000000001 RSI: 0000000000000017 >>>> RDI: >>>> ffffc9000332fcc0 >>>> May 8 14:21:16 manet kernel: RBP: 0000000000000c40 R08: 0000000000000001 >>>> R09: >>>> 0000000000000000 >>>> May 8 14:21:16 manet kernel: R10: 8080808080808080 R11: fefefefefefefeff >>>> R12: >>>> 0000000000000017 >>>> May 8 14:21:16 manet kernel: R13: 000000050909093c R14: 0000000000000000 >>>> R15: >>>> 0000000000004291 >>>> May 8 14:21:16 manet kernel: FS: 0000000000000000(0000) >>>> GS:ffff88886fb00000(0000) knlGS:0000000000000000 >>>> May 8 14:21:16 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: >>>> 0000000080050033 >>>> May 8 14:21:16 manet kernel: CR2: 00007fe2167d1e94 CR3: 0000000002013001 >>>> CR4: >>>> 00000000001606e0 >>>> May 8 14:21:16 manet kernel: Call Trace: >>>> May 8 14:21:16 manet kernel: kmalloc_order+0x1d/0x62 >>>> May 8 14:21:16 manet kernel: kmalloc_order_trace+0x21/0x10d >>>> May 8 14:21:16 manet kernel: __kmalloc+0x42/0x14a >>>> May 8 14:21:16 manet kernel: ? xprt_do_reserve+0x6f/0x173 [sunrpc] >>>> May 8 14:21:16 manet kernel: rpc_malloc+0x61/0xba [sunrpc] >>>> May 8 14:21:16 manet kernel: call_allocate+0xdd/0x1af [sunrpc] >>>> May 8 14:21:16 manet kernel: ? rpc_clnt_add_xprt+0x153/0x153 [sunrpc] >>>> May 8 14:21:16 manet kernel: __rpc_execute+0x135/0x42c [sunrpc] >>>> May 8 14:21:16 manet kernel: rpc_async_schedule+0x29/0x39 [sunrpc] >>>> May 8 14:21:16 manet kernel: process_one_work+0x285/0x4be >>>> May 8 14:21:16 manet kernel: worker_thread+0x1b0/0x26f >>>> May 8 14:21:16 manet kernel: ? cancel_delayed_work_sync+0xf/0xf >>>> May 8 14:21:16 manet kernel: kthread+0xf6/0xfb >>>> May 8 14:21:16 manet kernel: ? kthread_flush_work+0xc6/0xc6 >>>> May 8 14:21:16 manet kernel: ret_from_fork+0x24/0x30 >>>> >>>> And just now, the xfstests got stock and this appeared in v/l/m: >>>> >>>> May 8 14:22:52 manet kernel: BUG: unable to handle kernel paging request >>>> at >>>> ffffffffc0000000 >>>> May 8 14:22:52 manet kernel: #PF error: [INSTR] >>>> May 8 14:22:52 manet kernel: PGD 2016067 P4D 2016067 PUD 2018067 PMD 0 >>>> May 8 14:22:52 manet kernel: Oops: 0010 [#1] SMP >>>> May 8 14:22:52 manet kernel: CPU: 4 PID: 14848 Comm: rpc.gssd Tainted: >>>> G B W 5.1.0-rc6-00066-g3be130e #286 >>>> May 8 14:22:52 manet kernel: Hardware name: Supermicro SYS-6028R- >>>> T/X10DRi, >>>> BIOS 1.1a 10/16/2015 >>>> May 8 14:22:52 manet kernel: RIP: 0010:0xffffffffc0000000 >>>> May 8 14:22:52 manet kernel: Code: Bad RIP value. >>>> May 8 14:22:52 manet kernel: RSP: 0018:ffffc9000cc57cd8 EFLAGS: 00010287 >>>> May 8 14:22:52 manet kernel: RAX: ffff8884618641d8 RBX: ffffc9000cc57da0 >>>> RCX: >>>> ffff88886b9f0050 >>>> May 8 14:22:52 manet kernel: RDX: 0000000000000020 RSI: ffff88846847b550 >>>> RDI: >>>> ffff8884618641d8 >>>> May 8 14:22:52 manet kernel: RBP: ffffffffa04b22a0 R08: 0000000000000dc0 >>>> R09: >>>> 0000000000180011 >>>> May 8 14:22:52 manet kernel: R10: ffffc9000cc57b30 R11: 0000000000000002 >>>> R12: >>>> ffff8884618641d8 >>>> May 8 14:22:52 manet kernel: R13: 0000000000000c40 R14: 0000000000000010 >>>> R15: >>>> ffffc9000cc57d90 >>>> May 8 14:22:52 manet kernel: FS: 00007febec152700(0000) >>>> GS:ffff88846fb00000(0000) knlGS:0000000000000000 >>>> May 8 14:22:52 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: >>>> 0000000080050033 >>>> May 8 14:22:52 manet kernel: CR2: ffffffffbfffffd6 CR3: 0000000469fea003 >>>> CR4: >>>> 00000000001606e0 >>>> May 8 14:22:52 manet kernel: Call Trace: >>>> May 8 14:22:52 manet kernel: ? krb5_derive_key+0x83/0x365 >>>> [rpcsec_gss_krb5] >>>> May 8 14:22:52 manet kernel: ? crypto_cts_setkey+0x2c/0x3b [cts] >>>> May 8 14:22:52 manet kernel: ? >>>> gss_import_sec_context_kerberos+0x823/0xa6d >>>> [rpcsec_gss_krb5] >>>> May 8 14:22:52 manet kernel: ? kmem_cache_alloc_trace+0xe4/0x10b >>>> May 8 14:22:52 manet kernel: ? gss_import_sec_context+0x6c/0xa9 >>>> [auth_rpcgss] >>>> May 8 14:22:52 manet kernel: ? gss_pipe_downcall+0x2cc/0x5b4 >>>> [auth_rpcgss] >>>> May 8 14:22:52 manet kernel: ? rpc_pipe_write+0x56/0x6d [sunrpc] >>>> May 8 14:22:52 manet kernel: ? vfs_write+0xa3/0xfa >>>> May 8 14:22:52 manet kernel: ? ksys_write+0x60/0xb3 >>>> May 8 14:22:52 manet kernel: ? do_syscall_64+0x5a/0x68 >>>> May 8 14:22:52 manet kernel: ? entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>> May 8 14:22:52 manet kernel: Modules linked in: cts rpcsec_gss_krb5 >>>> ib_umad >>>> ib_ipoib mlx4_ib dm_mirror dm_region_hash dm_log dm_mod dax sb_edac >>>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iTCO_wdt >>>> iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul >>>> ghash_clmulni_intel rpcrdma aesni_intel rdma_ucm crypto_simd cryptd >>>> ib_iser >>>> glue_helper rdma_cm iw_cm pcspkr ib_cm libiscsi scsi_transport_iscsi >>>> lpc_ich >>>> i2c_i801 mfd_core mei_me sg mei ioatdma wmi ipmi_si ipmi_devintf >>>> ipmi_msghandler acpi_power_meter acpi_pad pcc_cpufreq auth_rpcgss >>>> ip_tables >>>> xfs libcrc32c mlx4_en sr_mod cdrom qedr sd_mod ast drm_kms_helper >>>> syscopyarea >>>> sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core crc32c_intel qede igb >>>> ahci >>>> libahci qed libata dca i2c_algo_bit i2c_core crc8 ib_uverbs ib_core nfsv4 >>>> dns_resolver nfsv3 nfs_acl nfs lockd grace sunrpc fscache >>>> May 8 14:22:52 manet kernel: CR2: ffffffffc0000000 >>>> May 8 14:22:52 manet kernel: ---[ end trace 3c526e4155ae084b ]--- >>>> May 8 14:22:52 manet kernel: RIP: 0010:0xffffffffc0000000 >>>> May 8 14:22:52 manet kernel: Code: Bad RIP value. >>>> May 8 14:22:52 manet kernel: RSP: 0018:ffffc9000cc57cd8 EFLAGS: 00010287 >>>> May 8 14:22:52 manet kernel: RAX: ffff8884618641d8 RBX: ffffc9000cc57da0 >>>> RCX: >>>> ffff88886b9f0050 >>>> May 8 14:22:52 manet kernel: RDX: 0000000000000020 RSI: ffff88846847b550 >>>> RDI: >>>> ffff8884618641d8 >>>> May 8 14:22:52 manet kernel: RBP: ffffffffa04b22a0 R08: 0000000000000dc0 >>>> R09: >>>> 0000000000180011 >>>> May 8 14:22:52 manet kernel: R10: ffffc9000cc57b30 R11: 0000000000000002 >>>> R12: >>>> ffff8884618641d8 >>>> May 8 14:22:52 manet kernel: R13: 0000000000000c40 R14: 0000000000000010 >>>> R15: >>>> ffffc9000cc57d90 >>>> May 8 14:22:52 manet kernel: FS: 00007febec152700(0000) >>>> GS:ffff88846fb00000(0000) knlGS:0000000000000000 >>>> May 8 14:22:52 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: >>>> 0000000080050033 >>>> May 8 14:22:52 manet kernel: CR2: ffffffffbfffffd6 CR3: 0000000469fea003 >>>> CR4: >>>> 00000000001606e0 >>>> May 8 14:22:52 manet kernel: BUG: sleeping function called from invalid >>>> context at /home/cel/src/linux/anna/include/linux/percpu-rwsem.h:34 >>>> May 8 14:22:52 manet kernel: in_atomic(): 0, irqs_disabled(): 1, pid: >>>> 14848, >>>> name: rpc.gssd >>>> May 8 14:22:52 manet kernel: INFO: lockdep is turned off. >>>> May 8 14:22:52 manet kernel: irq event stamp: 0 >>>> May 8 14:22:52 manet kernel: hardirqs last enabled at (0): >>>> [<0000000000000000>] (null) >>>> May 8 14:22:52 manet kernel: hardirqs last disabled at (0): >>>> [<ffffffff8106e944>] copy_process+0x774/0x1e24 >>>> May 8 14:22:52 manet kernel: softirqs last enabled at (0): >>>> [<ffffffff8106e944>] copy_process+0x774/0x1e24 >>>> May 8 14:22:52 manet kernel: softirqs last disabled at (0): >>>> [<0000000000000000>] (null) >>>> May 8 14:22:52 manet kernel: CPU: 4 PID: 14848 Comm: rpc.gssd Tainted: >>>> G B >>>> D W 5.1.0-rc6-00066-g3be130e #286 >>>> May 8 14:22:52 manet kernel: Hardware name: Supermicro SYS-6028R- >>>> T/X10DRi, >>>> BIOS 1.1a 10/16/2015 >>>> May 8 14:22:52 manet kernel: Call Trace: >>>> May 8 14:22:52 manet kernel: dump_stack+0x78/0xa9 >>>> May 8 14:22:52 manet kernel: ___might_sleep+0x1b8/0x1cd >>>> May 8 14:22:52 manet kernel: exit_signals+0x35/0x195 >>>> May 8 14:22:52 manet kernel: do_exit+0x122/0xa61 >>>> May 8 14:22:52 manet kernel: ? ksys_write+0x60/0xb3 >>>> May 8 14:22:52 manet kernel: rewind_stack_do_exit+0x17/0x20 >>>> >>>> >>>> -- >>>> Chuck Lever >>>> >>>> >>>> >> >> -- >> Chuck Lever >> >> >> > -- Chuck Lever