Re: SRPt oops with 4.5-rc3-ish

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/28/2016 03:26 AM, Nicholas A. Bellinger wrote:

> AFAIK, the oldest last working srpt commit with se_node_acl + se_session
> active I/O shutdown is:
> 
> ib_srpt: Call target_sess_cmd_list_set_waiting during shutdown_session
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/infiniband/ulp/srpt?id=1d19f7800d
> 
> Note this is ~40 upstream commits between then and now in v4.5-rc5.
> 
> Please confirm when you started triggering this regression during target
> service restart.

I don't have a clear answer for that, although it just happened again on
a v4.5-rc4 kernel.  It's pretty annoying because the trigger is (as
often as anything else) and yum upgrade process.  And it hangs mid way
through the process.  I don't want to know how corrupted my RPM db or my
filesystem is :-(

Anyway, I have a clearer oops this time that I'll attach here, but this
will be my last one from this kernel as I'm upgrading to the most recent
v4.6-rc kernel.  If the oops still happens on v4.6-rc, I'll update here.

Here's the oops series, machine was useless after this (disk access was
blocked for all processes):

[4752021.950589] ------------[ cut here ]------------
[4752021.955992] WARNING: CPU: 5 PID: 10364 at
drivers/infiniband/ulp/srpt/ib_srpt.c:3251
srpt_close_session+0x12f/0x140 [ib_srpt]()
[4752021.969091] Modules linked in: hfi1(C) 8021q garp mrp
target_core_user uio target_core_pscsi target_core_file
target_core_iblock ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ip_set
nfnetlink ebtable_nat ebtable_filter ebtable_broute bridge stp llc
ebtables ip6table_mangle ip6table_raw nf_defrag_ipv6 ip6table_security
ip6table_filter ip6_tables iptable_mangle iptable_raw nf_defrag_ipv4
nf_conntrack(-) iptable_security ib_isert iscsi_target_mod ib_iser
libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp
scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm
ib_cm iw_cm ib_sa ib_mad intel_rapl x86_pkg_temp_thermal coretemp
kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul ipmi_devintf iTCO_wdt
crc32_pclmul ghash_clmulni_intel iTCO_vendor_support dcdbas ipmi_si
sb_edac mei_me edac_core
[4752022.049588]  ioatdma mei ipmi_msghandler lpc_ich dca shpchp wmi
acpi_power_meter tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc
xfs libcrc32c mlx5_ib raid1 raid0 ib_core ib_addr mgag200 i2c_algo_bit
drm_kms_helper ttm crc32c_intel mlx5_core tg3 drm ptp megaraid_sas
pps_core fjes [last unloaded: nf_conntrack_ipv6]
[4752022.080463] CPU: 5 PID: 10364 Comm: targetctl Tainted: G         CI
    4.5.0-0.rc4.git0.1.fc24.x86_64 #1
[4752022.091366] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS
1.0.4 08/28/2014
[4752022.100131]  0000000000000286 00000000189b0c8a ffff880de32ffcc0
ffffffff813d3e0f
[4752022.108624]  0000000000000000 ffffffffa04872f0 ffff880de32ffcf8
ffffffff810a4fe2
[4752022.117126]  ffff881fd427a800 ffff88100fcb7000 0000000000000001
ffff88100fcb70e8
[4752022.125629] Call Trace:
[4752022.128565]  [<ffffffff813d3e0f>] dump_stack+0x63/0x84
[4752022.134513]  [<ffffffff810a4fe2>] warn_slowpath_common+0x82/0xc0
[4752022.141431]  [<ffffffff810a512a>] warn_slowpath_null+0x1a/0x20
[4752022.148155]  [<ffffffffa04830bf>] srpt_close_session+0x12f/0x140
[ib_srpt]
[4752022.156055]  [<ffffffffa0639de4>] target_release_session+0x24/0x30
[target_core_mod]
[4752022.164925]  [<ffffffffa063bb3d>] target_put_session+0x1d/0x20
[target_core_mod]
[4752022.173403]  [<ffffffffa06395eb>]
core_tpg_del_initiator_node_acl+0x16b/0x240 [target_core_mod]
[4752022.183343]  [<ffffffffa062d23f>]
target_fabric_nacl_base_release+0x3f/0x50 [target_core_mod]
[4752022.193082]  [<ffffffff812cc133>] config_item_release+0x63/0xd0
[4752022.199902]  [<ffffffff812cc1c2>] config_item_put+0x22/0x30
[4752022.206326]  [<ffffffff812ca676>] configfs_rmdir+0x1d6/0x2e0
[4752022.212857]  [<ffffffff8124ea0c>] vfs_rmdir+0xbc/0x130
[4752022.218803]  [<ffffffff81253c6a>] do_rmdir+0x19a/0x220
[4752022.224750]  [<ffffffff81254a16>] SyS_rmdir+0x16/0x20
[4752022.230598]  [<ffffffff817cd6ae>] entry_SYSCALL_64_fastpath+0x12/0x6d
[4752022.238009] ---[ end trace befc2f337e9f56d7 ]---
[4752027.739051] ib_srpt Received IB DREQ ERROR event.
[4752029.794988] ib_srpt Received IB TimeWait exit for cm_id
ffff881ff5d55800.
[4752029.807121] BUG: unable to handle kernel paging request at
0000000000017930
[4752029.815120] IP: [<ffffffff810ee9a5>]
queued_spin_lock_slowpath+0x105/0x190
[4752029.823015] PGD 0
[4752029.825466] Oops: 0002 [#1] SMP
[4752029.829286] Modules linked in: hfi1(C) 8021q garp mrp
target_core_user uio target_core_pscsi target_core_file
target_core_iblock ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ip_set
nfnetlink ebtable_nat ebtable_filter ebtable_broute bridge stp llc
ebtables ip6table_mangle ip6table_raw nf_defrag_ipv6 ip6table_security
ip6table_filter ip6_tables iptable_mangle iptable_raw nf_defrag_ipv4
nf_conntrack(-) iptable_security ib_isert iscsi_target_mod ib_iser
libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp
scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm
ib_cm iw_cm ib_sa ib_mad intel_rapl x86_pkg_temp_thermal coretemp
kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul ipmi_devintf iTCO_wdt
crc32_pclmul ghash_clmulni_intel iTCO_vendor_support dcdbas ipmi_si
sb_edac mei_me edac_core
[4752029.913124]  ioatdma mei ipmi_msghandler lpc_ich dca shpchp wmi
acpi_power_meter tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc
xfs libcrc32c mlx5_ib raid1 raid0 ib_core ib_addr mgag200 i2c_algo_bit
drm_kms_helper ttm crc32c_intel mlx5_core tg3 drm ptp megaraid_sas
pps_core fjes [last unloaded: nf_conntrack_ipv6]
[4752029.946121] CPU: 7 PID: 288828 Comm: kworker/7:0 Tainted: G
WCI     4.5.0-0.rc4.git0.1.fc24.x86_64 #1
[4752029.958057] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS
1.0.4 08/28/2014
[4752029.967563] Workqueue: events srpt_release_channel_work [ib_srpt]
[4752029.975315] task: ffff8820352e5b80 ti: ffff881f5da10000 task.ti:
ffff881f5da10000
[4752029.984607] RIP: 0010:[<ffffffff810ee9a5>]  [<ffffffff810ee9a5>]
queued_spin_lock_slowpath+0x105/0x190
[4752029.995941] RSP: 0018:ffff881f5da13da8  EFLAGS: 00010006
[4752030.002790] RAX: 0000000000017930 RBX: 0000000000000286 RCX:
ffff88203d2d7900
[4752030.011668] RDX: 00000000000039eb RSI: 00000000e7b31ae8 RDI:
ffff880de32ffd20
[4752030.020528] RBP: ffff881f5da13da8 R08: 0000000000200000 R09:
0000000000000000
[4752030.029374] R10: 0000000000000000 R11: 000000000001a700 R12:
ffff880de32ffd18
[4752030.038206] R13: ffff881fd2c6b780 R14: ffff881fd427a800 R15:
ffff881fd427a8d0
[4752030.047025] FS:  0000000000000000(0000) GS:ffff88203d2c0000(0000)
knlGS:0000000000000000
[4752030.056913] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4752030.064174] CR2: 0000000000017930 CR3: 0000000de33db000 CR4:
00000000001406e0
[4752030.072995] Stack:
[4752030.076087]  ffff881f5da13dc0 ffffffff817cd4c7 ffff880de32ffd20
ffff881f5da13de8
[4752030.085236]  ffffffff810e7cfd ffff881fd427a8d0 ffff88100fcb7000
ffff881fd2c6b780
[4752030.094382]  ffff881f5da13e18 ffffffffa0485931 ffff881fc81c60c0
ffff88203d2d65c0
[4752030.103531] Call Trace:
[4752030.107120]  [<ffffffff817cd4c7>] _raw_spin_lock_irqsave+0x37/0x40
[4752030.114886]  [<ffffffff810e7cfd>] complete+0x1d/0x50
[4752030.121291]  [<ffffffffa0485931>]
srpt_release_channel_work+0xe1/0x140 [ib_srpt]
[4752030.130416]  [<ffffffff810bd6fd>] process_one_work+0x1ad/0x400
[4752030.137791]  [<ffffffff810bd99e>] worker_thread+0x4e/0x480
[4752030.144772]  [<ffffffff810bd950>] ? process_one_work+0x400/0x400
[4752030.152327]  [<ffffffff810bd950>] ? process_one_work+0x400/0x400
[4752030.159879]  [<ffffffff810c38e8>] kthread+0xd8/0xf0
[4752030.166170]  [<ffffffff810c3810>] ? kthread_worker_fn+0x180/0x180
[4752030.173823]  [<ffffffff817cd9ff>] ret_from_fork+0x3f/0x70
[4752030.180702]  [<ffffffff810c3810>] ? kthread_worker_fn+0x180/0x180
[4752030.188352] Code: 02 89 c2 45 31 c9 c1 e2 10 85 d2 74 41 c1 ea 12
83 e0 03 83 ea 01 48 c1 e0 04 48 63 d2 48 05 00 79 01 00 48 03 04 d5 00
d5 d3 81 <48> 89 08 8b 41 08 85 c0 75 09 f3 90 8b 41 08 85 c0 74 f7 4c 8b
[4752030.211521] RIP  [<ffffffff810ee9a5>]
queued_spin_lock_slowpath+0x105/0x190
[4752030.220180]  RSP <ffff881f5da13da8>
[4752030.224954] CR2: 0000000000017930
[4752030.231895] ---[ end trace befc2f337e9f56d8 ]---
[4752030.312493] BUG: unable to handle kernel paging request at
ffffffffffffffd8
[4752030.322906] IP: [<ffffffff810c3f80>] kthread_data+0x10/0x20
[4752030.331299] PGD 1c0d067 PUD 1c0f067 PMD 0
[4752030.337938] Oops: 0000 [#2] SMP
[4752030.343539] Modules linked in: hfi1(C) 8021q garp mrp
target_core_user uio target_core_pscsi target_core_file
target_core_iblock ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ip_set
nfnetlink ebtable_nat ebtable_filter ebtable_broute bridge stp llc
ebtables ip6table_mangle ip6table_raw nf_defrag_ipv6 ip6table_security
ip6table_filter ip6_tables iptable_mangle iptable_raw nf_defrag_ipv4
nf_conntrack(-) iptable_security ib_isert iscsi_target_mod ib_iser
libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp
scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm
ib_cm iw_cm ib_sa ib_mad intel_rapl x86_pkg_temp_thermal coretemp
kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul ipmi_devintf iTCO_wdt
crc32_pclmul ghash_clmulni_intel iTCO_vendor_support dcdbas ipmi_si
sb_edac mei_me edac_core
[4752030.432786]  ioatdma mei ipmi_msghandler lpc_ich dca shpchp wmi
acpi_power_meter tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc
xfs libcrc32c mlx5_ib raid1 raid0 ib_core ib_addr mgag200 i2c_algo_bit
drm_kms_helper ttm crc32c_intel mlx5_core tg3 drm ptp megaraid_sas
pps_core fjes [last unloaded: nf_conntrack_ipv6]
[4752030.467298] CPU: 7 PID: 288828 Comm: kworker/7:0 Tainted: G      D
WCI     4.5.0-0.rc4.git0.1.fc24.x86_64 #1
[4752030.479665] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS
1.0.4 08/28/2014
[4752030.489575] task: ffff8820352e5b80 ti: ffff881f5da10000 task.ti:
ffff881f5da10000
[4752030.499244] RIP: 0010:[<ffffffff810c3f80>]  [<ffffffff810c3f80>]
kthread_data+0x10/0x20
[4752030.509511] RSP: 0018:ffff881f5da13a80  EFLAGS: 00010002
[4752030.516747] RAX: 0000000000000000 RBX: 0000000000000007 RCX:
0000000000000007
[4752030.526034] RDX: ffff88103d410000 RSI: 0000000000000007 RDI:
ffff8820352e5b80
[4752030.535318] RBP: ffff881f5da13a80 R08: ffff8820352e5c28 R09:
ffff8820352e5c00
[4752030.544599] R10: 0000000000000000 R11: 000000000000002f R12:
0000000000016dc0
[4752030.553884] R13: ffff8820352e61d8 R14: ffff8820352e5b80 R15:
ffff88203d2d6dc0
[4752030.563161] FS:  0000000000000000(0000) GS:ffff88203d2c0000(0000)
knlGS:0000000000000000
[4752030.573516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4752030.581247] CR2: 0000000000000028 CR3: 0000000de33db000 CR4:
00000000001406e0
[4752030.590525] Stack:
[4752030.594064]  ffff881f5da13a98 ffffffff810be581 ffff88203d2d6dc0
ffff881f5da13ae8
[4752030.603691]  ffffffff817c91ba 00ff881f652b6478 ffff881f00000007
ffff8820352e5b80
[4752030.613311]  ffff881f5da10000 0000000000000000 ffff881f5da13b38
ffff881f5da135d0
[4752030.622926] Call Trace:
[4752030.626959]  [<ffffffff810be581>] wq_worker_sleeping+0x11/0x90
[4752030.634789]  [<ffffffff817c91ba>] __schedule+0x62a/0x9b0
[4752030.642030]  [<ffffffff817c957c>] schedule+0x3c/0x90
[4752030.648874]  [<ffffffff810a7f48>] do_exit+0x7a8/0xb30
[4752030.655813]  [<ffffffff8101992a>] oops_end+0x9a/0xd0
[4752030.662650]  [<ffffffff81067e7e>] no_context+0x13e/0x390
[4752030.669886]  [<ffffffff81068150>] __bad_area_nosemaphore+0x80/0x1f0
[4752030.678193]  [<ffffffff810682d3>] bad_area_nosemaphore+0x13/0x20
[4752030.686209]  [<ffffffff81068597>] __do_page_fault+0xb7/0x400
[4752030.693834]  [<ffffffff81068910>] do_page_fault+0x30/0x80
[4752030.701166]  [<ffffffff817cfa48>] page_fault+0x28/0x30
[4752030.708210]  [<ffffffff810ee9a5>] ?
queued_spin_lock_slowpath+0x105/0x190
[4752030.717062]  [<ffffffff817cd4c7>] _raw_spin_lock_irqsave+0x37/0x40
[4752030.725221]  [<ffffffff810e7cfd>] complete+0x1d/0x50
[4752030.731999]  [<ffffffffa0485931>]
srpt_release_channel_work+0xe1/0x140 [ib_srpt]
[4752030.741523]  [<ffffffff810bd6fd>] process_one_work+0x1ad/0x400
[4752030.749298]  [<ffffffff810bd99e>] worker_thread+0x4e/0x480
[4752030.756677]  [<ffffffff810bd950>] ? process_one_work+0x400/0x400
[4752030.764626]  [<ffffffff810bd950>] ? process_one_work+0x400/0x400
[4752030.772558]  [<ffffffff810c38e8>] kthread+0xd8/0xf0
[4752030.779231]  [<ffffffff810c3810>] ? kthread_worker_fn+0x180/0x180
[4752030.787241]  [<ffffffff817cd9ff>] ret_from_fork+0x3f/0x70
[4752030.794438]  [<ffffffff810c3810>] ? kthread_worker_fn+0x180/0x180
[4752030.802395] Code: 97 69 70 00 e9 53 ff ff ff e8 4d 0e fe ff 0f 1f
00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 e0 05 00 00 55
48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[4752030.826210] RIP  [<ffffffff810c3f80>] kthread_data+0x10/0x20
[4752030.833669]  RSP <ffff881f5da13a80>
[4752030.838651] CR2: ffffffffffffffd8
[4752030.843418] ---[ end trace befc2f337e9f56d9 ]---
[4752030.933774] Fixing recursive fault but reboot is needed!




-- 
Doug Ledford <dledford@xxxxxxxxxx>
              GPG KeyID: 0E572FDD


Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux