Re: Kernel Crash on iSCSI volume create

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/26/17 1:51 AM, Nicholas A. Bellinger wrote:

Hey Bryant,

On Wed, 2017-05-24 at 13:34 -0500, Bryant G. Ly wrote:
Hi Nick,

We are seeing the below kernel crash with a block backstore.

The steps to reproduce are: Discovered a new iSCSI volume, created a new ibmvscsis instance, mapped it.

Mmmm, based upon the INQUIRY string this looks like a PSCSI backstore,
yes..?

No, its a block backstore, we dont support PSCSI via our management console (Novalink).
I don't know why the inquiry string would show it as a PSCSI backstore.


So a iscsi initiator provided struct scsi_device is registered
with /sys/kernel/config/target/core/pscsi_*, and then mapped to a
ibmvscsis lun, right..?

Btw looking at 4.8.17, nothing has changed since Jan
in /drivers/target/, and certainly nothing related to configfs operation
well before that..

I assume this not a new regression, right..?

We just started testing with iSCSI, so we did not see it before.

[ 7872.359650] scsi host6: iSCSI Initiator over TCP/IP
[ 7872.621777] scsi 6:0:0:0: Direct-Access     IBM      2145             0000 PQ: 0 ANSI: 6
[ 7872.665105] sd 6:0:0:0: Attached scsi generic sg14 type 0
[ 7872.665600] sd 6:0:0:0: [sdg] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 7872.666986] sd 6:0:0:0: [sdg] Write Protect is off
[ 7872.666990] sd 6:0:0:0: [sdg] Mode Sense: 97 00 10 08
[ 7872.667691] sd 6:0:0:0: [sdg] Write cache: disabled, read cache: enabled, supports DPO and FUA
[ 7872.680928] sd 6:0:0:0: [sdg] Attached SCSI disk
[ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added
[ 7879.893760] ------------[ cut here ]------------
[ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs]
[ 7879.893768] Modules linked in: target_core_pscsi target_core_file target_core_iblock rpadlpar_io rpaphp ip6table_raw xt_CT xt_mac xt_tcpudp xt_comment xt_physdev xt_set ip_set_hash_net ip_set iscsi_target_mod iptable_raw dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag openvswitch nf_nat_ipv6 libcrc32c target_core_user uio binfmt_misc pseries_rng ibmvmc(O) vmx_crypto rtc_generic xt_conntrack nf_nat_ftp nf_conntrack_ftp nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv6 nf_defrag_ipv6 nbd ipt_REJECT nf_reject_ipv4 ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables ip6table_filter ip6_tables ebtables x_tables br_netfilter
[ 7879.893797]  bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 ses enclosure scsi_transport_sas ibmvscsis target_core_mod configfs ibmvscsi ibmveth(O) qla2xxx scsi_transport_fc ipr
[ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G           O    4.8.17-customv2.22 #12
[ 7879.893812] task: c00000018a0d3400 task.stack: c0000001f3b40000
[ 7879.893813] NIP: d000000002c664ec LR: d000000002c60980 CTR: c000000000b70870
[ 7879.893814] REGS: c0000001f3b43810 TRAP: 0700   Tainted: G           O     (4.8.17-customv2.22)
[ 7879.893815] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28222242  XER: 00000000
[ 7879.893820] CFAR: d000000002c664bc SOFTE: 1
                 GPR00: d000000002c60980 c0000001f3b43a90 d000000002c70908 c0000000fbc06820
                 GPR04: c0000001ef1bd900 0000000000000004 0000000000000001 0000000000000000
                 GPR08: 0000000000000000 0000000000000001 d000000002c69560 d000000002c66d80
                 GPR12: c000000000b70870 c00000000e798700 c0000001f3b43ca0 c0000001d4949d40
                 GPR16: c00000014637e1c0 0000000000000000 0000000000000000 c0000000f2392940
                 GPR20: c0000001f3b43b98 0000000000000041 0000000000600000 0000000000000000
                 GPR24: fffffffffffff000 0000000000000000 d000000002c60be0 c0000001f1dac490
                 GPR28: 0000000000000004 0000000000000000 c0000001ef1bd900 c0000000f2392940
[ 7879.893839] NIP [d000000002c664ec] config_item_get+0x7c/0x90 [configfs]
[ 7879.893841] LR [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
[ 7879.893842] Call Trace:
[ 7879.893844] [c0000001f3b43ac0] [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
[ 7879.893847] [c0000001f3b43b10] [c000000000329770] do_dentry_open+0x2c0/0x460
[ 7879.893849] [c0000001f3b43b70] [c000000000344480] path_openat+0x210/0x1490
[ 7879.893851] [c0000001f3b43c80] [c00000000034708c] do_filp_open+0xfc/0x170
[ 7879.893853] [c0000001f3b43db0] [c00000000032b5bc] do_sys_open+0x1cc/0x390
[ 7879.893856] [c0000001f3b43e30] [c000000000009584] system_call+0x38/0xec
[ 7879.893856] Instruction dump:
[ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d220000 e94981e0 892a0000
[ 7879.893861] 2f890000 409effe0 39200001 992a0000 <0fe00000> 4bffffd0 60000000 60000000
[ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]---
[ 7912.992583] sd 6:0:0:0: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 7912.992587] sd 6:0:0:0: [sdg] tag#0 Sense Key : Illegal Request [current]
[ 7912.992589] sd 6:0:0:0: [sdg] tag#0 Add. Sense: Logical unit not supported
[ 7912.992591] sd 6:0:0:0: [sdg] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 80 00
[ 7912.992593] blk_update_request: I/O error, dev sdg, sector 0

The kref_get() on struct config_item->ci_kref in config_item_get() is
triggering a sanity check warning struct kref has reached zero, while a
new kref_get() is attempting to take a new reference.

Normally it means kref_get() should be using kref_get_unless_zero(), but
since this has never been reported before, it makes me think something
else outside of configfs proper is going on..

What configfs operations are triggering this warning, and what do the
configfs operations preceding it look like..?

Also, is this specific to PSCSI export or does it trigger independent of
target-core backend driver..?

They were attaching a iSCSI volume, so basically we do a discover/login and then targetcli
is called to do the mappings. Also between the attach there were a few ls queries.
All this is done via Novalink API(Virtualization management).

So the only configfs operations are through the mapping (backstores/block create) and the ls queries.

A side note is that prior to Mike Christie's fix with "tcmu: fix crash during device removal",
we saw the crash occur with the below stack trace. Then if the system was left in that state... Idk what else is done
in between but the trace I sent in the original email would start to pop up. We had figured the
below stack trace was the culprit for the check_perm problem, but now we see that with the fix from Mike.

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766172] Unable to handle kernel paging request for instruction fetch

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766184] Faulting instruction address: 0x3239332020203230

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766189] Oops: Kernel access of bad area, sig: 11 [#1]

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766192] SMP NR_CPUS=2048 NUMA pSeries

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766172] Unable to handle kernel paging request for instruction fetch

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766184] Faulting instruction address: 0x3239332020203230

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766189] Oops: Kernel access of bad area, sig: 11 [#1]

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766192] SMP NR_CPUS=2048 NUMA pSeries

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766201] Modules linked
in: rpadlpar_io rpaphp mmfs26(O) mmfslinux(O) tracedev(O) openvswitch
dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag
netlink_diag target_core_pscsi target_core_file target_core_iblock
iscsi_target_mod target_core_user uio binfmt_misc pseries_rng ibmvmc(O)
rtc_generic xt_conntrack nf_nat_ftp nf_conntrack_ftp
nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns
nf_conntrack_broadcast nf_conntrack_ipv6 nf_defrag_ipv6 nbd br_netfilter
 bridge ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr
vhost_net vhost macvtap macvlan iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi ip_vs iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 iptable_mangle iptable_filter ipt_REJECT nf_reject_ipv4
ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat nf_conntrack ip_tables
ip6table_filter ip6_tables ebtables x_tables dm_snapshot dm_bufio
dm_multipath 8021q garp mrp stp llc autofs4 ses enclosure ibmvscsis
target_core_mod configfs ibmveth bnx2x lpfc ipr vxlan ip6_udp_tunnel
udp_tunnel scsi_transport_fc mdio libcrc32c

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766327] CPU: 22 PID:
4855 Comm: tcmu-runner Tainted: G           O    4.4.21-customv1.29 #6

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766332] task: c00000097779cf60 ti: c00000097785c000 task.ti: c00000097785c000

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766337] NIP: 3239332020203230 LR: d0000000114211ac CTR: 3239332020203233

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766341] REGS:
c00000097785fa90 TRAP: 0400   Tainted: G           O
(4.4.21-customv1.29)

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766345] MSR:
8000000140009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24004824  XER:
20000000

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766359] CFAR: c000000000008468 SOFTE: 1

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766359] GPR00: c0000000002e7a20 c00000097785fd10 d00000001142abe8 c00000090ff30f58

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766359] GPR04: c000000aa83c5a08 c000000799814b10 0000000000000001 0000000000000000

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766359] GPR08: 0000000000000000 3239332020203233 c000000001945400 0000000000000c01

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766359] GPR12: 3239332020203233 c00000000fb4d100 0000000000000009 0000000000000007

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766359] GPR16: 000000000000000a 0000000000000000 0000000000000008 00003ffffc65deb4

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766359] GPR20: 0000000000000005 00003ffffc65deb8 000000000000007d 000001002646ede0

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766359] GPR24: 0000010026513380 c000000799814b10 c000000aa83c5a08 c00000084212c3c0

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766359] GPR28: c000000aaa6a0020 c000000aa83c5a08 c00000090cf2b018 c000000aa4c50dc0

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766413] NIP [3239332020203230] 0x3239332020203230

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766422] LR [d0000000114211ac] uio_release+0x5c/0xe0 [uio]

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766426] Call Trace:

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766429] [c00000097785fd10] [c000000aa83c5a08] 0xc000000aa83c5a08 (unreliable)

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766438] [c00000097785fd50] [c0000000002e7a20] __fput+0xe0/0x310

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766446] [c00000097785fdb0] [c0000000000e2c10] task_work_run+0xf0/0x130

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766452] [c00000097785fe00] [c0000000000178d4] do_notify_resume+0xc4/0xd0

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766462] [c00000097785fe30] [c000000000009838] ret_from_except_lite+0x64/0x68

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766467] Instruction dump:

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766472] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766479] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

Apr 18 11:31:19 cs-tul13-neo kernel: [71187.766492] ---[ end trace 22916847693b7599 ]---


-Bryant




--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux