Re: ib_uverbs: list corruption destroying a cq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 26, 2017 at 9:52 AM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote:
> Hey all,
>
> The test group hit this during a heavy rdma stress test that sets up a few
> thousand connections, runs some IO, then tears down the connections.  It
> repeatedly does this.  After around 4 hours, they see the warning below.  Looks
> like the list pointer were from freed memory (poisoned)?    This is with
> linux-4.13-rc2.
>
> Has anyone else seen this?  I didn't find anything looking in recent posts...
>
> Thanks,
>
> Steve
>
> ---
>
> list_del corruption. prev->next should be ffff9514cf64be90, but was
> dead000000000100
> ------------[ cut here ]------------
> WARNING: CPU: 3 PID: 27966 at lib/list_debug.c:53
> __list_del_entry_valid+0x83/0xa0
> Modules linked in: rdma_ucm iw_cxgb4 cxgb4 nfsv3 nfs_acl nfs fscache lockd grace
> rpcrdma sunrpc rdma_cm ib_cm iw_cm ib_uverbs ebtable_nat ebtables ipt_REJECT
> nf_reject _ipv4 xt_CHECKSUM bridge autofs4 target_core_iblock target_core_file
> target_core_pscsi target_core_mod configfs bnx2fc cnic uio fcoe libfcoe libfc
> 8021q garp scsi_tran sport_fc stp llc dm_mirror dm_region_hash dm_log vhost_net
> vhost tap tun kvm_intel kvm irqbypass uinput ppdev floppy parport_pc parport
> iTCO_wdt iTCO_vendor_support pc spkr serio_raw sg i2c_i801 lpc_ich mfd_core igb
> dca shpchp i5400_edac i5k_amb dm_mod(E) dax(E) ext4(E) jbd2(E) mbcache(E)
> sd_mod(E) pata_acpi(E) ata_generic(E) ata_pii x(E) ib_core(E) libcxgb(E) ipv6(E)
> crc_ccitt(E) ptp(E) pps_core(E) radeon(E) ttm(E) drm_kms_helper(E) drm(E)
> fb_sys_fops(E) sysimgblt(E)
>  sysfillrect(E) syscopyarea(E) i2c_algo_bit(E) i2c_core(E) [last unloaded:
> cxgb4]
> CPU: 3 PID: 27966 Comm: mbw Tainted: G            E   4.13.0-rc2 #1
> Hardware name: Supermicro X7DWU/X7DWU, BIOS 1.2c 11/19/2010
> task: ffff951450fb6780 task.stack: ffffa81588144000
> RIP: 0010:__list_del_entry_valid+0x83/0xa0
> RSP: 0000:ffffa81588147b38 EFLAGS: 00010092
> RAX: 0000000000000054 RBX: ffff9514731e4240 RCX: 0000000000000000
> RDX: ffff9514efd94880 RSI: ffff9514efd8cb68 RDI: ffff9514efd8cb68
> RBP: ffffa81588147b38 R08: 0000000000000004 R09: 0000000000000000
> R10: 0000000000000074 R11: 000000000000000f R12: ffff9514a230b000
> R13: ffff9514cf64be80 R14: ffff9514d19bab38 R15: ffff9514d19bab58
> FS:  000014e8e054d720(0000) GS:ffff9514efd80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000006df4b0 CR3: 000000052dcb9000 CR4: 00000000000406e0
> Call Trace:
>  ib_uverbs_release_ucq+0x64/0x160 [ib_uverbs]
>  uverbs_free_cq+0x51/0x80 [ib_uverbs]
>  remove_commit_idr_uobject+0x22/0x50 [ib_uverbs]
>  ? uverbs_uobject_free+0x32/0x40 [ib_uverbs]
>  uverbs_cleanup_ucontext+0xe6/0x1a0 [ib_uverbs]
>  ib_uverbs_cleanup_ucontext+0x23/0x40 [ib_uverbs]
>  ib_uverbs_close+0x3c/0x120 [ib_uverbs]
>  __fput+0xc8/0x240
>  ____fput+0xe/0x10
>  task_work_run+0x68/0xa0
>  ? free_fs_struct+0x32/0x40
>  do_exit+0x16a/0x470
>  ? __getnstimeofday64+0x4d/0xf0
>  ? getnstimeofday64+0xe/0x20
>  ? __audit_syscall_entry+0xaa/0x100
>  do_group_exit+0x4e/0xc0
>  SyS_exit_group+0x17/0x20
>  do_syscall_64+0x55/0xd0
>  entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x3fe06acf38
> RSP: 002b:00007ffc10a6efd8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> RAX: ffffffffffffffda RBX: 0000003fe098a838 RCX: 0000003fe06acf38
> RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
> RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff98
> R10: 0000003fe0991828 R11: 0000000000000246 R12: 0000003fe098a838
> R13: 00007ffc10a6f0d0 R14: 0000000000000000 R15: 0000000000000000
> Code: c0 c9 c3 48 89 fe 31 c0 48 c7 c7 78 17 a2 93 e8 78 a2 d9 ff 0f ff 31 c0 c9
> c3 48 89 fe 31 c0 48 c7 c7 38 17 a2 93 e8 61 a2 d9 ff <0f> ff 31 c0 c9 c3 48 89
> fe 31  c0 48 c7 c7 00 17 a2 93 e8 4a a2
> ---[ end trace 8aab4de4e7eb9238 ]---

We have hit a similar list error with iSER on the 4.9.x series kernel.
Not sure if they are related.

[174144.405626] ------------[ cut here ]------------
[174144.405635] WARNING: CPU: 11 PID: 11466 at lib/list_debug.c:62
__list_del_entry+0x82/0xd0
[174144.405636] list_del corruption. next->prev should be
ffff887ae67112b0, but was ffff887ae6701b68
[174144.405682] Modules linked in: ib_isert target_core_user uio
target_core_pscsi target_core_file target_core_iblock iscsi_target_mod
ip_vs nf_conntrack macvlan bonding iptable_filter ib_iser rdma_ucm
ib_ucm ib_uverbs ib_umad ipmi_devintf sb_edac edac_core
x86_pkg_temp_thermal intel_powerclamp coretemp raid10 zfs(PO) iTCO_wdt
iTCO_vendor_support kvm_intel zunicode(PO) zavl(PO) kvm zcommon(PO)
znvpair(PO) spl(O) irqbypass pcspkr joydev i2c_i801 i2c_smbus sg
mei_me lpc_ich mei mfd_core ioatdma shpchp ipmi_si ipmi_msghandler
acpi_power_meter acpi_pad ip_tables xfs libcrc32c mlx4_en mlx4_ib
raid1 rdma_cm iw_cm ib_cm mlx5_ib ib_core sd_mod 8021q garp mrp
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw
gf128mul glue_helper ablk_helper cryptd ast drm_kms_helper syscopyarea
sysfillrect sysimgblt
[174144.405690]  mlx5_core fb_sys_fops ttm mlx4_core drm ahci libahci
igb libata dca ptp pps_core i2c_algo_bit wmi sunrpc dm_mirror
dm_region_hash dm_log dm_mod
[174144.405692] CPU: 11 PID: 11466 Comm: kworker/11:2 Tainted: P
    O    4.9.32-5.el7.centos.x86_64 #1
[174144.405693] Hardware name: Supermicro SYS-6028TP-HTFR/X10DRT-PIBF,
BIOS 1.1 08/03/2015
[174144.405701] Workqueue: target_completion target_complete_ok_work
[174144.405704]  ffffc90369e03d50 ffffffff8134fbdc ffffc90369e03da0
0000000000000000
[174144.405705]  ffffc90369e03d90 ffffffff81083501 0000003e00000246
ffff887ae67112a8
[174144.405707]  ffff887f658ca0c0 ffff887f7f2d8800 ffff887f7f2e3c00
ffff887ae67112b0
[174144.405708] Call Trace:
[174144.405715]  [<ffffffff8134fbdc>] dump_stack+0x63/0x87
[174144.405718]  [<ffffffff81083501>] __warn+0xd1/0xf0
[174144.405719]  [<ffffffff8108357f>] warn_slowpath_fmt+0x5f/0x80
[174144.405721]  [<ffffffff81515b59>] ? target_complete_ok_work+0x169/0x360
[174144.405723]  [<ffffffff8136f552>] __list_del_entry+0x82/0xd0
[174144.405726]  [<ffffffff8109d042>] process_one_work+0xe2/0x400
[174144.405727]  [<ffffffff8109d9a5>] worker_thread+0x125/0x4b0
[174144.405729]  [<ffffffff8109d880>] ? rescuer_thread+0x380/0x380
[174144.405730]  [<ffffffff8109d880>] ? rescuer_thread+0x380/0x380
[174144.405733]  [<ffffffff810a36b6>] kthread+0xe6/0x100
[174144.405735]  [<ffffffff810a35d0>] ? kthread_park+0x60/0x60
[174144.405738]  [<ffffffff8175aa55>] ret_from_fork+0x25/0x30
[174144.405739] ---[ end trace 131fc2a58d958f73 ]---
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux