Re: LIO crashing Fedora box, multiple versions and kernels tested

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I presented a new LUN (and no others) with emulate-caw set to 0 via
the following command:
echo 0 > /sys/kernel/config/target/core/iblock_1/test/attrib/emulate_caw

Upon starting the first ESXi host after configuring the new LUN, I
started getting the following on the console every few seconds:

Message from syslogd@labsan2 at Apr  7 23:00:29 ...
 kernel:[17424.117000] NMI watchdog: BUG: soft lockup - CPU#2 stuck
for 22s! [systemd-udevd:1782]

after a few minutes of this (and while trying to get more logs) I
started getting the following call trace on the console:

Apr  7 23:11:25 labsan2 kernel: [18080.117000]  [<ffffffff811d3d03>]
handle_mm_fault+0xe43/0x1850
Apr  7 23:11:25 labsan2 kernel: [18080.117000]  [<ffffffff812686ae>] ?
ep_scan_ready_list+0x20e/0x220
Apr  7 23:11:25 labsan2 kernel: [18080.117000]  [<ffffffff81268930>] ?
ep_poll+0x240/0x440
Apr  7 23:11:25 labsan2 kernel: [18080.117000]  [<ffffffff81064d13>]
__do_page_fault+0x193/0x450
Apr  7 23:11:25 labsan2 kernel: [18080.117000]  [<ffffffff81065001>]
do_page_fault+0x31/0x70
Apr  7 23:11:25 labsan2 kernel: [18080.117000]  [<ffffffff817887c8>]
page_fault+0x28/0x30
Apr  7 23:11:25 labsan2 kernel: [18080.117000] Code: 00 48 03 14 c5 c0
ff d0 81 48 89 de 48 89 df e8 6d 69 29 00 84 c0 74 13 44 89 e7 ff 15
50 9e b0 00 eb 08 66 0f 1f 44 00 00 f3 90 <f6> 43 18 01 75 f8 31 c0 48
8b 7d e8 65 48 33 3c 25 28 00 00 00
Apr  7 23:11:26 labsan2 kernel: [18081.031504] ABORT_TASK: Found
referenced qla2xxx task_tag: 1139688
Apr  7 23:11:26 labsan2 kernel: [18081.031506] ABORT_TASK: ref_tag:
1139688 already complete, skipping
Apr  7 23:11:26 labsan2 kernel: [18081.031508] ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1139688
Apr  7 23:11:53 labsan2 kernel: [18108.117000] NMI watchdog: BUG: soft
lockup - CPU#2 stuck for 22s! [systemd-udevd:1782]
Apr  7 23:11:53 labsan2 kernel: [18108.117000] Modules linked in:
tcm_qla2xxx target_core_user uio target_core_pscsi target_core_file
target_core_iblock iscsi_target_mod target_core_mod ip6t_rpfilter
ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute
bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security
ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
iptable_security iptable_raw iTCO_wdt iTCO_vendor_support gpio_ich
coretemp lpc_ich mfd_core kvm_intel kvm ipmi_devintf ipmi_ssif
serio_raw i5000_edac ioatdma dca shpchp edac_core ipmi_si i5k_amb
ipmi_msghandler acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace
sunrpc xfs libcrc32c radeon i2c_algo_bit drm_kms_helper qla2xxx ttm
mptsas drm scsi_transport_sas mptscsih ata_generic pata_acpi mptbase
bnx2 scsi_transport_fc
Apr  7 23:11:53 labsan2 kernel: [18108.117000] CPU: 2 PID: 1782 Comm:
systemd-udevd Tainted: G        W    L  4.0.0-0.rc2.git0.1.fc22.x86_64
#1
Apr  7 23:11:53 labsan2 kernel: [18108.117000] Hardware name: IBM IBM
eServer BladeCenter HS21 -[8853L6U]-/Server Blade, BIOS
-[BCE142BUS-1.18]- 06/17/2009
Apr  7 23:11:53 labsan2 kernel: [18108.117000] task: ffff8800c9f84f00
ti: ffff88012a344000 task.ti: ffff88012a344000
Apr  7 23:11:53 labsan2 kernel: [18108.117000] RIP:
0010:[<ffffffff8111b5aa>]  [<ffffffff8111b5aa>]
generic_exec_single+0xea/0x1a0
Apr  7 23:11:53 labsan2 kernel: [18108.117000] RSP:
0000:ffff88012a347be8  EFLAGS: 00000202
Apr  7 23:11:53 labsan2 kernel: [18108.117000] RAX: 00000000000008fb
RBX: 0000000000000141 RCX: 0000000000000000
Apr  7 23:11:53 labsan2 kernel: [18108.117000] RDX: 00000000000008fb
RSI: 00000000000000fb RDI: 0000000000000282
Apr  7 23:11:53 labsan2 kernel: [18108.117000] RBP: ffff88012a347c28
R08: 0000000000000001 R09: ffff880000000e18
Apr  7 23:11:53 labsan2 kernel: [18108.117000] R10: ffff880000000000
R11: 0000000000130000 R12: 000200da00000001
Apr  7 23:11:53 labsan2 kernel: [18108.117000] R13: 0000000000000000
R14: ffff88012ffe8d80 R15: 0000010000000000
Apr  7 23:11:53 labsan2 kernel: [18108.117000] FS:
00007f41505c5880(0000) GS:ffff88012fd00000(0000)
knlGS:0000000000000000
Apr  7 23:11:53 labsan2 kernel: [18108.117000] CS:  0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Apr  7 23:11:53 labsan2 kernel: [18108.117000] CR2: 00007ffcc75c3fa8
CR3: 000000012a324000 CR4: 00000000000007e0
Apr  7 23:11:53 labsan2 kernel: [18108.117000] Stack:
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  ffff88012a347c08
0000000000000000 ffffffff8106be30 ffff88012a347cb8
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  0000000000000003
00000000cfa942cb 0000000000000001 ffffffff8106be30
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  ffff88012a347c58
ffffffff8111b6c0 ffff88012a347c58 ffffffff8139ec2b
Apr  7 23:11:53 labsan2 kernel: [18108.117000] Call Trace:
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  [<ffffffff8106be30>] ?
do_flush_tlb_all+0x50/0x50
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  [<ffffffff8106be30>] ?
do_flush_tlb_all+0x50/0x50
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  [<ffffffff8111b6c0>]
smp_call_function_single+0x60/0xa0
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  [<ffffffff8139ec2b>] ?
cpumask_next_and+0x3b/0x50
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  [<ffffffff8106be30>] ?
do_flush_tlb_all+0x50/0x50
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  [<ffffffff8111bce3>]
smp_call_function_many+0x233/0x280
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  [<ffffffff8106c048>]
native_flush_tlb_others+0xb8/0xc0
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  [<ffffffff8106c2e4>]
flush_tlb_page+0x54/0x90
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  [<ffffffff811e556e>]
ptep_clear_flush+0x4e/0x60
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  [<ffffffff811d2333>]
do_wp_page+0x333/0x940
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  [<ffffffff811d3d03>]
handle_mm_fault+0xe43/0x1850
Apr  7 23:11:53 labsan2 kernel: [18108.117000]  [<ffffffff812686ae>] ?
ep_scan_ready_list+0x20e/0x220

Thanks again

On Tue, Apr 7, 2015 at 9:30 PM, Nicholas A. Bellinger
<nab@xxxxxxxxxxxxxxx> wrote:
> On Tue, 2015-04-07 at 20:41 -0400, Dan Lane wrote:
>> Initiators are the same as the crashy storage, QLA2462.  Here's what
>> lspci shows for them if it helps:
>>
>> 0d:01.0 Fibre Channel: QLogic Corp. ISP2422-based 4Gb Fibre Channel to
>> PCI-X HBA (rev 02)
>> 0d:01.1 Fibre Channel: QLogic Corp. ISP2422-based 4Gb Fibre Channel to
>> PCI-X HBA (rev 02)
>>
>> I will work on getting storage presented from one of the problematic
>> systems with the emulate_caw=0 setting, can you be more specific on
>> where this should be set?
>
> emulate_caw is a backend device attribute.  It can be changed for each
> backend device within the configuration file before restarting the
> target service, or in /backstores/iblock/$DEVICE using targetcli.
>
> Note that this will need to happen before the ESX FC initiator connects,
> and performs the LUN SCAN to query if COMPARE_AND_WRITE is supported or
> not.
>
> --nab
>
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux