I presented a new LUN (and no others) with emulate-caw set to 0 via the following command: echo 0 > /sys/kernel/config/target/core/iblock_1/test/attrib/emulate_caw Upon starting the first ESXi host after configuring the new LUN, I started getting the following on the console every few seconds: Message from syslogd@labsan2 at Apr 7 23:00:29 ... kernel:[17424.117000] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-udevd:1782] after a few minutes of this (and while trying to get more logs) I started getting the following call trace on the console: Apr 7 23:11:25 labsan2 kernel: [18080.117000] [<ffffffff811d3d03>] handle_mm_fault+0xe43/0x1850 Apr 7 23:11:25 labsan2 kernel: [18080.117000] [<ffffffff812686ae>] ? ep_scan_ready_list+0x20e/0x220 Apr 7 23:11:25 labsan2 kernel: [18080.117000] [<ffffffff81268930>] ? ep_poll+0x240/0x440 Apr 7 23:11:25 labsan2 kernel: [18080.117000] [<ffffffff81064d13>] __do_page_fault+0x193/0x450 Apr 7 23:11:25 labsan2 kernel: [18080.117000] [<ffffffff81065001>] do_page_fault+0x31/0x70 Apr 7 23:11:25 labsan2 kernel: [18080.117000] [<ffffffff817887c8>] page_fault+0x28/0x30 Apr 7 23:11:25 labsan2 kernel: [18080.117000] Code: 00 48 03 14 c5 c0 ff d0 81 48 89 de 48 89 df e8 6d 69 29 00 84 c0 74 13 44 89 e7 ff 15 50 9e b0 00 eb 08 66 0f 1f 44 00 00 f3 90 <f6> 43 18 01 75 f8 31 c0 48 8b 7d e8 65 48 33 3c 25 28 00 00 00 Apr 7 23:11:26 labsan2 kernel: [18081.031504] ABORT_TASK: Found referenced qla2xxx task_tag: 1139688 Apr 7 23:11:26 labsan2 kernel: [18081.031506] ABORT_TASK: ref_tag: 1139688 already complete, skipping Apr 7 23:11:26 labsan2 kernel: [18081.031508] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1139688 Apr 7 23:11:53 labsan2 kernel: [18108.117000] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-udevd:1782] Apr 7 23:11:53 labsan2 kernel: [18108.117000] Modules linked in: tcm_qla2xxx target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iTCO_wdt iTCO_vendor_support gpio_ich coretemp lpc_ich mfd_core kvm_intel kvm ipmi_devintf ipmi_ssif serio_raw i5000_edac ioatdma dca shpchp edac_core ipmi_si i5k_amb ipmi_msghandler acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c radeon i2c_algo_bit drm_kms_helper qla2xxx ttm mptsas drm scsi_transport_sas mptscsih ata_generic pata_acpi mptbase bnx2 scsi_transport_fc Apr 7 23:11:53 labsan2 kernel: [18108.117000] CPU: 2 PID: 1782 Comm: systemd-udevd Tainted: G W L 4.0.0-0.rc2.git0.1.fc22.x86_64 #1 Apr 7 23:11:53 labsan2 kernel: [18108.117000] Hardware name: IBM IBM eServer BladeCenter HS21 -[8853L6U]-/Server Blade, BIOS -[BCE142BUS-1.18]- 06/17/2009 Apr 7 23:11:53 labsan2 kernel: [18108.117000] task: ffff8800c9f84f00 ti: ffff88012a344000 task.ti: ffff88012a344000 Apr 7 23:11:53 labsan2 kernel: [18108.117000] RIP: 0010:[<ffffffff8111b5aa>] [<ffffffff8111b5aa>] generic_exec_single+0xea/0x1a0 Apr 7 23:11:53 labsan2 kernel: [18108.117000] RSP: 0000:ffff88012a347be8 EFLAGS: 00000202 Apr 7 23:11:53 labsan2 kernel: [18108.117000] RAX: 00000000000008fb RBX: 0000000000000141 RCX: 0000000000000000 Apr 7 23:11:53 labsan2 kernel: [18108.117000] RDX: 00000000000008fb RSI: 00000000000000fb RDI: 0000000000000282 Apr 7 23:11:53 labsan2 kernel: [18108.117000] RBP: ffff88012a347c28 R08: 0000000000000001 R09: ffff880000000e18 Apr 7 23:11:53 labsan2 kernel: [18108.117000] R10: ffff880000000000 R11: 0000000000130000 R12: 000200da00000001 Apr 7 23:11:53 labsan2 kernel: [18108.117000] R13: 0000000000000000 R14: ffff88012ffe8d80 R15: 0000010000000000 Apr 7 23:11:53 labsan2 kernel: [18108.117000] FS: 00007f41505c5880(0000) GS:ffff88012fd00000(0000) knlGS:0000000000000000 Apr 7 23:11:53 labsan2 kernel: [18108.117000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 7 23:11:53 labsan2 kernel: [18108.117000] CR2: 00007ffcc75c3fa8 CR3: 000000012a324000 CR4: 00000000000007e0 Apr 7 23:11:53 labsan2 kernel: [18108.117000] Stack: Apr 7 23:11:53 labsan2 kernel: [18108.117000] ffff88012a347c08 0000000000000000 ffffffff8106be30 ffff88012a347cb8 Apr 7 23:11:53 labsan2 kernel: [18108.117000] 0000000000000003 00000000cfa942cb 0000000000000001 ffffffff8106be30 Apr 7 23:11:53 labsan2 kernel: [18108.117000] ffff88012a347c58 ffffffff8111b6c0 ffff88012a347c58 ffffffff8139ec2b Apr 7 23:11:53 labsan2 kernel: [18108.117000] Call Trace: Apr 7 23:11:53 labsan2 kernel: [18108.117000] [<ffffffff8106be30>] ? do_flush_tlb_all+0x50/0x50 Apr 7 23:11:53 labsan2 kernel: [18108.117000] [<ffffffff8106be30>] ? do_flush_tlb_all+0x50/0x50 Apr 7 23:11:53 labsan2 kernel: [18108.117000] [<ffffffff8111b6c0>] smp_call_function_single+0x60/0xa0 Apr 7 23:11:53 labsan2 kernel: [18108.117000] [<ffffffff8139ec2b>] ? cpumask_next_and+0x3b/0x50 Apr 7 23:11:53 labsan2 kernel: [18108.117000] [<ffffffff8106be30>] ? do_flush_tlb_all+0x50/0x50 Apr 7 23:11:53 labsan2 kernel: [18108.117000] [<ffffffff8111bce3>] smp_call_function_many+0x233/0x280 Apr 7 23:11:53 labsan2 kernel: [18108.117000] [<ffffffff8106c048>] native_flush_tlb_others+0xb8/0xc0 Apr 7 23:11:53 labsan2 kernel: [18108.117000] [<ffffffff8106c2e4>] flush_tlb_page+0x54/0x90 Apr 7 23:11:53 labsan2 kernel: [18108.117000] [<ffffffff811e556e>] ptep_clear_flush+0x4e/0x60 Apr 7 23:11:53 labsan2 kernel: [18108.117000] [<ffffffff811d2333>] do_wp_page+0x333/0x940 Apr 7 23:11:53 labsan2 kernel: [18108.117000] [<ffffffff811d3d03>] handle_mm_fault+0xe43/0x1850 Apr 7 23:11:53 labsan2 kernel: [18108.117000] [<ffffffff812686ae>] ? ep_scan_ready_list+0x20e/0x220 Thanks again On Tue, Apr 7, 2015 at 9:30 PM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote: > On Tue, 2015-04-07 at 20:41 -0400, Dan Lane wrote: >> Initiators are the same as the crashy storage, QLA2462. Here's what >> lspci shows for them if it helps: >> >> 0d:01.0 Fibre Channel: QLogic Corp. ISP2422-based 4Gb Fibre Channel to >> PCI-X HBA (rev 02) >> 0d:01.1 Fibre Channel: QLogic Corp. ISP2422-based 4Gb Fibre Channel to >> PCI-X HBA (rev 02) >> >> I will work on getting storage presented from one of the problematic >> systems with the emulate_caw=0 setting, can you be more specific on >> where this should be set? > > emulate_caw is a backend device attribute. It can be changed for each > backend device within the configuration file before restarting the > target service, or in /backstores/iblock/$DEVICE using targetcli. > > Note that this will need to happen before the ESX FC initiator connects, > and performs the LUN SCAN to query if COMPARE_AND_WRITE is supported or > not. > > --nab > -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html