Re: Kernel v4.16 / v4.17 SRP and SRPT patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2018-01-09 at 15:31 -0500, Laurence Oberman wrote:
> On Tue, 2018-01-09 at 15:15 -0500, Laurence Oberman wrote:
> > [  220.843344] ------------[ cut here ]------------
> > [  220.869309] list_add corruption. prev->next should be next
> > (000000002a07d255), but was           (null).
> > (prev=000000000edf5e8c).
> > [  220.935392] WARNING: CPU: 1 PID: 694 at lib/list_debug.c:28
> > __list_add_valid+0x6a/0x70
> > [  220.979462] Modules linked in: xt_CHECKSUM iptable_mangle
> > ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
> > nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT
> > nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables
> > ip6table_filter ip6_tables iptable_filter rpcrdma ib_isert
> > iscsi_target_mod target_core_mod ib_iser libiscsi
> > scsi_transport_iscsi
> > ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
> > rdma_cm ib_cm iw_cm mlx5_ib ib_core intel_powerclamp coretemp
> > kvm_intel
> > kvm irqbypass crct10dif_pclmul crc32_pclmul ipmi_ssif
> > ghash_clmulni_intel pcbc aesni_intel joydev ipmi_si crypto_simd
> > dm_service_time iTCO_wdt hpwdt iTCO_vendor_support glue_helper cryptd
> > ipmi_devintf sg gpio_ich pcspkr hpilo ipmi_msghandler lpc_ich
> > acpi_power_meter i7core_edac shpchp
> > [  221.385270]  pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace
> > sunrpc
> > dm_multipath ip_tables xfs libcrc32c radeon i2c_algo_bit
> > drm_kms_helper
> > syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core mlxfw
> > sd_mod drm ptp hpsa pps_core crc32c_intel i2c_core serio_raw bnx2
> > devlink scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod
> > [  221.554496] CPU: 1 PID: 694 Comm: kworker/1:1H Tainted:
> > G          I      4.15.0-rc7+ #1
> > [  221.606907] Hardware name: HP ProLiant DL380 G7, BIOS P67
> > 08/16/2015
> > [  221.642980] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> > [  221.674616] RIP: 0010:__list_add_valid+0x6a/0x70
> > [  221.700561] RSP: 0018:ffffb2bdc75c7cf0 EFLAGS: 00010086
> > [  221.730608] RAX: 0000000000000000 RBX: ffff94342d610880 RCX:
> > ffffffff8ba62928
> > [  221.771490] RDX: 0000000000000001 RSI: 0000000000000082 RDI:
> > 0000000000000046
> > [  221.812721] RBP: ffff94342d6108b8 R08: 0000000000000000 R09:
> > 0000000000000722
> > [  221.853073] R10: 0000000000000000 R11: ffffb2bdc75c7a58 R12:
> > 0000000000000200
> > [  221.894156] R13: 0000000000000246 R14: ffff943fb7fd5000 R15:
> > ffff943fb7fd5000
> > [  221.935233] FS:  0000000000000000(0000) GS:ffff944033200000(0000)
> > knlGS:0000000000000000
> > [  221.980521] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  222.013062] CR2: 00007f1bdc0ee910 CR3: 00000017e7e0a002 CR4:
> > 00000000000206e0
> > [  222.052302] Call Trace:
> > [  222.065971]  ib_mad_post_receive_mads+0x177/0x310 [ib_core]
> > [  222.097349]  ib_mad_recv_done+0x471/0x9c0 [ib_core]
> > [  222.124387]  __ib_process_cq+0x55/0xa0 [ib_core]
> > [  222.150827]  ib_cq_poll_work+0x1b/0x60 [ib_core]
> > [  222.177751]  process_one_work+0x141/0x340
> > [  222.200383]  worker_thread+0x47/0x3e0
> > [  222.220641]  kthread+0xf5/0x130
> > [  222.238951]  ? rescuer_thread+0x380/0x380
> > [  222.262034]  ? kthread_associate_blkcg+0x90/0x90
> > [  222.288514]  ? do_group_exit+0x39/0xa0
> > [  222.309492]  ret_from_fork+0x1f/0x30
> > [  222.330073] Code: fe 31 c0 48 c7 c7 98 36 89 8b e8 02 9c cf ff 0f
> > ff
> > 31 c0 c3 48 89 d1 48 c7 c7 48 36 89 8b 48 89 f2 48 89 c6 31 c0 e8 e6
> > 9b
> > cf ff <0f> ff 31 c0 c3 90 48 8b 07 48 b9 00 01 00 00 00 00 ad de 48
> > 8b 
> > [  222.438058] ---[ end trace 5d41544bf17ab73b ]---
> > [  222.465993] BUG: unable to handle kernel NULL pointer dereference
> > at
> > 0000000000000028
> > [  222.510316] IP: ib_mad_post_receive_mads+0x3c/0x310 [ib_core]
> > [  222.543188] PGD 0 P4D 0 
> > [  222.557625] Oops: 0000 [#1] SMP PTI
> > [  222.576674] Modules linked in: xt_CHECKSUM iptable_mangle
> > ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
> > nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT
> > nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables
> > ip6table_filter ip6_tables iptable_filter rpcrdma ib_isert
> > iscsi_target_mod target_core_mod ib_iser libiscsi
> > scsi_transport_iscsi
> > ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
> > rdma_cm ib_cm iw_cm mlx5_ib ib_core intel_powerclamp coretemp
> > kvm_intel
> > kvm irqbypass crct10dif_pclmul crc32_pclmul ipmi_ssif
> > ghash_clmulni_intel pcbc aesni_intel joydev ipmi_si crypto_simd
> > dm_service_time iTCO_wdt hpwdt iTCO_vendor_support glue_helper cryptd
> > ipmi_devintf sg gpio_ich pcspkr hpilo ipmi_msghandler lpc_ich
> > acpi_power_meter i7core_edac shpchp
> > [  222.981443]  pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace
> > sunrpc
> > dm_multipath ip_tables xfs libcrc32c radeon i2c_algo_bit
> > drm_kms_helper
> > syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core mlxfw
> > sd_mod drm ptp hpsa pps_core crc32c_intel i2c_core serio_raw bnx2
> > devlink scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod
> > [  223.152359] CPU: 1 PID: 694 Comm: kworker/1:1H Tainted: G        W
> > I      4.15.0-rc7+ #1
> > [  223.198577] Hardware name: HP ProLiant DL380 G7, BIOS P67
> > 08/16/2015
> > [  223.235101] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> > [  223.266750] RIP: 0010:ib_mad_post_receive_mads+0x3c/0x310
> > [ib_core]
> > [  223.303012] RSP: 0018:ffffb2bdc75c7cf8 EFLAGS: 00010286
> > [  223.333022] RAX: 0000000000000000 RBX: ffff94342d610908 RCX:
> > ffff94342d610948
> > [  223.373307] RDX: 0000000000000001 RSI: ffff94342d6108c0 RDI:
> > ffff94342d610908
> > [  223.414451] RBP: ffff94342d610940 R08: ffff94342a8e64c0 R09:
> > ffff94342a8e64e8
> > [  223.454789] R10: ffff94342a8e64e8 R11: ffff94342d6109a8 R12:
> > ffff944029c2e048
> > [  223.496554] R13: 0000000000000000 R14: ffff94342a8e64c0 R15:
> > ffff94342d6108c0
> > [  223.537489] FS:  0000000000000000(0000) GS:ffff944033200000(0000)
> > knlGS:0000000000000000
> > [  223.583538] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  223.616545] CR2: 0000000000000028 CR3: 00000017e7e0a002 CR4:
> > 00000000000206e0
> > [  223.657337] Call Trace:
> > [  223.671022]  ? find_mad_agent+0x77/0x1b0 [ib_core]
> > [  223.698581]  ? __kmalloc+0x1be/0x1f0
> > [  223.719074]  ib_mad_recv_done+0x471/0x9c0 [ib_core]
> > [  223.747190]  __ib_process_cq+0x55/0xa0 [ib_core]
> > [  223.774140]  ib_cq_poll_work+0x1b/0x60 [ib_core]
> > [  223.800719]  process_one_work+0x141/0x340
> > [  223.824120]  worker_thread+0x47/0x3e0
> > [  223.845133]  kthread+0xf5/0x130
> > [  223.863116]  ? rescuer_thread+0x380/0x380
> > [  223.886173]  ? kthread_associate_blkcg+0x90/0x90
> > [  223.912207]  ? do_group_exit+0x39/0xa0
> > [  223.933198]  ret_from_fork+0x1f/0x30
> > [  223.953218] Code: 55 41 54 55 48 8d 6f 38 53 48 89 fb 48 83 ec 50
> > 65
> > 48 8b 04 25 28 00 00 00 48 89 44 24 48 31 c0 48 8b 07 48 85 f6 48 89
> > 4c
> > 24 08 <48> 8b 50 28 8b 12 48 c7 44 24 28 00 00 00 00 c7 44 24 40 01
> > 00 
> > [  224.059985] RIP: ib_mad_post_receive_mads+0x3c/0x310 [ib_core]
> > RSP:
> > ffffb2bdc75c7cf8
> > [  224.103994] CR2: 0000000000000028
> 
> Just wanted to add that the panic is consistent, rebooted into only a
> single path to my SRP LUNS and on reboot had the same panic.

Hello Laurence,

Can you repeat your test with the following two kernels:
* v4.15-rc7 (Linus' latest).
* The for-next branch of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git.

I'm asking this because the crash occurred in a code path that is not modified by
any of my patches.

Thanks,

Bart.��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux