Re: oops in 5.4 on rdma

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 10, 2019 at 12:37 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>
>
>
> > On Dec 10, 2019, at 12:34 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> >
> > Hi Chuck,
> >
> > Is this known? Running your cel/testing from commit
> > 37e235c0128566e9d97741ad1e546b44f324f108
>
> The WARNING does not look familiar, but cel-testing has moved on.
> Can you fetch it again?

Ok I'm now at commit 192702977a49ebfeff138f51d8ad8bc524d812ea and I
don't see it while running generic/013.

>
>
> > I started generic/013 and test hung for long time, got this but then
> > test ran successfully.
> >
> > [  153.452029] ------------[ cut here ]------------
> > [  153.507281] WARNING: CPU: 14 PID: 975 at
> > drivers/infiniband/core/cq.c:310 ib_free_cq_user+0xea/0x100 [ib_core]
> > [  153.626988] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver
> > nfs fscache rdma_rxe ip6_udp_tunnel udp_tunnel nfsd auth_rpcgss
> > nfs_acl lockd grace xt_CHECKSUM xt_MASQUERADE tun bridge stp llc
> > ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6
> > xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_nat
> > ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_nat
> > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle
> > iptable_security iptable_raw ebtable_filter ebtables ip6table_filter
> > ip6_tables iptable_filter ib_isert iscsi_target_mod ib_srpt
> > target_core_mod ib_srp scsi_transport_srp rpcrdma sunrpc
> > intel_rapl_msr intel_rapl_common rdma_ucm x86_pkg_temp_thermal ib_iser
> > intel_powerclamp coretemp rdma_cm kvm_intel iw_cm ib_umad ib_ipoib
> > libiscsi kvm scsi_transport_iscsi ib_cm irqbypass crct10dif_pclmul
> > mlx5_ib crc32_pclmul iTCO_wdt ipmi_ssif ghash_clmulni_intel
> > iTCO_vendor_support aesni_intel ib_uverbs crypto_simd ipmi_si cryptd
> > ipmi_devintf pcspkr ib_core
> > [  153.627026]  glue_helper i2c_i801 sg lpc_ich ipmi_msghandler wmi
> > acpi_power_meter ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper
> > syscopyarea sysfillrect sysimgblt fb_sys_fops drm_vram_helper ttm isci
> > mlx5_core libsas igb drm ahci qla2xxx libahci scsi_transport_sas
> > libata dca crc32c_intel i2c_algo_bit i2c_core scsi_transport_fc
> > pci_hyperv_intf dm_mirror dm_region_hash dm_log dm_mod
> > [  155.086407] CPU: 14 PID: 975 Comm: kworker/u52:0 Not tainted 5.4.0+ #1
> > [  155.164520] Hardware name: FUJITSU PRIMERGY RX200 S7/D3032-A1, BIOS
> > V4.6.5.3 R2.29.0 for D3032-A1x 06/18/2018
> > [  155.283237] Workqueue: xprtiod xprt_autoclose [sunrpc]
> > [  155.344725] RIP: 0010:ib_free_cq_user+0xea/0x100 [ib_core]
> > [  155.410365] Code: d7 48 8b 03 48 85 c0 75 e8 e9 6a ff ff ff 48 8d
> > 7f 40 e8 89 9a 52 d6 e9 57 ff ff ff 48 8d 7f 40 e8 0b de 86 d6 e9 49
> > ff ff ff <0f> 0b 5b 5d 41 5c c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00
> > 00 00
> > [  155.635114] RSP: 0018:ffff98e4c6aebda0 EFLAGS: 00010202
> > [  155.697624] RAX: 0000000000000001 RBX: ffff8b85efdb8000 RCX: 0000000000000000
> > [  155.783015] RDX: ffff8b861516ae80 RSI: 0000000000000000 RDI: ffff8b8df0087000
> > [  155.868404] RBP: ffff8b8df0087000 R08: 0000000000000001 R09: 0000000000000000
> > [  155.953795] R10: ffff8b8e1724b000 R11: ffffffffffffffa6 R12: ffff8b85efdb8000
> > [  156.039186] R13: 0000000000000000 R14: ffff8b86071cb000 R15: ffff8b85efdb8448
> > [  156.124577] FS:  0000000000000000(0000) GS:ffff8b861fa00000(0000)
> > knlGS:0000000000000000
> > [  156.221405] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  156.290157] CR2: 00007fde99d85000 CR3: 000000025f20a003 CR4: 00000000000606e0
> > [  156.375548] Call Trace:
> > [  156.404805]  rpcrdma_ep_destroy+0x43/0x70 [rpcrdma]
> > [  156.463171]  rpcrdma_ep_disconnect+0xf2/0x1c0 [rpcrdma]
> > [  156.525683]  ? __switch_to_asm+0x34/0x70
> > [  156.572589]  ? __switch_to_asm+0x40/0x70
> > [  156.619500]  ? __switch_to_asm+0x34/0x70
> > [  156.666409]  ? __switch_to_asm+0x40/0x70
> > [  156.713321]  ? __switch_to_asm+0x34/0x70
> > [  156.760238]  xprt_rdma_close+0x49/0xc0 [rpcrdma]
> > [  156.815481]  xprt_autoclose+0x50/0xb0 [sunrpc]
> > [  156.868635]  process_one_work+0x171/0x380
> > [  156.916584]  worker_thread+0x49/0x3f0
> > [  156.960375]  kthread+0xf8/0x130
> > [  156.997926]  ? max_active_store+0x80/0x80
> > [  157.045875]  ? kthread_bind+0x10/0x10
> > [  157.089665]  ret_from_fork+0x35/0x40
> > [  157.132416] ---[ end trace dcd41693526c20ae ]---
> >
> > Var log also had this:
> > Dec 10 12:37:03 localhost kolga: run xfstest generic/013
> > Dec 10 12:37:03 localhost journal: run fstests generic/013 at
> > 2019-12-10 12:37:03
> > Dec 10 12:39:54 localhost kernel: INFO: task kworker/6:2:295 blocked
> > for more than 122 seconds.
> > Dec 10 12:39:54 localhost kernel:      Tainted: G        W         5.4.0+ #1
> > Dec 10 12:39:54 localhost kernel: "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Dec 10 12:39:55 localhost kernel: kworker/6:2     D    0   295      2 0x80004000
> > Dec 10 12:39:55 localhost kernel: Workqueue: events xprt_destroy_cb [sunrpc]
> > Dec 10 12:39:55 localhost kernel: Call Trace:
> > Dec 10 12:39:55 localhost kernel: ? __schedule+0x2d1/0x6c0
> > Dec 10 12:39:55 localhost kernel: schedule+0x39/0xa0
> > Dec 10 12:39:55 localhost kernel: schedule_timeout+0x1c8/0x290
> > Dec 10 12:39:55 localhost kernel: ? tracing_is_on+0x11/0x30
> > Dec 10 12:39:55 localhost kernel: ? trace_save_cmdline+0x68/0xd0
> > Dec 10 12:39:55 localhost kernel: wait_for_completion+0x123/0x190
> > Dec 10 12:39:55 localhost kernel: ? wake_up_q+0x70/0x70
> > Dec 10 12:39:55 localhost kernel: __flush_work.isra.35+0x11e/0x1a0
> > Dec 10 12:39:55 localhost kernel: ? get_work_pool+0x40/0x40
> > Dec 10 12:39:55 localhost kernel: __cancel_work_timer+0x103/0x190
> > Dec 10 12:39:55 localhost kernel: xprt_rdma_destroy+0x22/0xb0 [rpcrdma]
> > Dec 10 12:39:55 localhost kernel: process_one_work+0x171/0x380
> > Dec 10 12:39:55 localhost kernel: worker_thread+0x49/0x3f0
> > Dec 10 12:39:55 localhost kernel: kthread+0xf8/0x130
> > Dec 10 12:39:55 localhost kernel: ? max_active_store+0x80/0x80
> > Dec 10 12:39:55 localhost kernel: ? kthread_bind+0x10/0x10
> > Dec 10 12:39:55 localhost kernel: ret_from_fork+0x35/0x40
>
> --
> Chuck Lever
>
>
>



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux