We now think this may be related to a problem network switch, so here is the latest dump for LIO and I wonder if this gives any insight as to why it would crash rather than keep retrying: May 7 23:26:16 roc-4r-scd212 kernel: [129694.169005] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [iscsi_ttx:5831] May 7 23:26:16 roc-4r-scd212 kernel: [129694.169053] Modules linked in: rbd libceph libcrc32c iscsi_target_mod target_core_file target_core_pscsi target_core_iblock target_core_mod configfs xt_multiport iptable_filter ip_tables x_tables enhanceio_rand(OE) enhanceio_lru(OE) enhanceio_fifo(OE) enhanceio(OE) ipmi_devintf ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac joydev edac_core mei_me nfsd mei lpc_ich auth_rpcgss ioatdma nfs_acl ipmi_si nfs ipmi_msghandler 8250_fintek lockd 8021q grace garp mrp stp sunrpc llc wmi bonding fscache mac_hid lp parport mlx4_en vxlan ip6_udp_tunnel udp_tunnel hid_generic igb ahci mpt2sas i2c_algo_bit usbhid libahci dca hid ptp raid_class mlx4_core scsi_transport_sas pps_core May 7 23:26:16 roc-4r-scd212 kernel: [129694.169116] CPU: 4 PID: 5831 Comm: iscsi_ttx Tainted: G C OEL 4.1.0-040100rc2-generic #201505032335 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169118] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169120] task: ffff8810590b2840 ti: ffff88085a988000 task.ti: ffff88085a988000 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169122] RIP: 0010:[<ffffffff8180374b>] [<ffffffff8180374b>] _raw_spin_unlock_irqrestore+0x1b/0x50 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169131] RSP: 0018:ffff88085a98bd00 EFLAGS: 00000286 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169133] RAX: 0000000000000091 RBX: ffff8808491af440 RCX: ffff8808491af440 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169135] RDX: 0000000000008614 RSI: 0000000000000286 RDI: 0000000000000286 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169136] RBP: ffff88085a98bd08 R08: ffff8808491af510 R09: 0000000000000101 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169138] R10: 0000000000000004 R11: dead000000200200 R12: ffff8808491af510 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169140] R13: 0000000000000101 R14: 0000000000000004 R15: dead000000200200 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169143] FS: 0000000000000000(0000) GS:ffff88085fb00000(0000) knlGS:0000000000000000 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169145] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169147] CR2: 00007f528c2ebe60 CR3: 0000000001e0f000 CR4: 00000000001407e0 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169148] Stack: May 7 23:26:16 roc-4r-scd212 kernel: [129694.169150] ffff8808491af450 ffff88085a98bd48 ffffffffc0565e0b ffff88085a98bd48 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169153] ffffffffc05bf418 ffff8808491af240 ffff8808491af450 0000000000000001 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169156] 0000000000000001 ffff88085a98bd78 ffffffffc0567fd5 ffff8808491af440 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169160] Call Trace: May 7 23:26:16 roc-4r-scd212 kernel: [129694.169183] [<ffffffffc0565e0b>] transport_wait_for_tasks+0xbb/0x150 [target_core_mod] May 7 23:26:16 roc-4r-scd212 kernel: [129694.169199] [<ffffffffc05bf418>] ? iscsit_remove_cmd_from_response_queue+0xe8/0x120 [iscsi_target_mod] May 7 23:26:16 roc-4r-scd212 kernel: [129694.169213] [<ffffffffc0567fd5>] transport_generic_free_cmd+0xc5/0xe0 [target_core_mod] May 7 23:26:16 roc-4r-scd212 kernel: [129694.169223] [<ffffffffc05c0716>] iscsit_free_cmd+0x96/0x160 [iscsi_target_mod] May 7 23:26:16 roc-4r-scd212 kernel: [129694.169233] [<ffffffffc05c98dc>] iscsit_close_connection+0x47c/0x770 [iscsi_target_mod] May 7 23:26:16 roc-4r-scd212 kernel: [129694.169242] [<ffffffffc05b4c83>] iscsit_take_action_for_connection_exit+0x83/0x110 [iscsi_target_mod] May 7 23:26:16 roc-4r-scd212 kernel: [129694.169251] [<ffffffffc05c8690>] iscsi_target_tx_thread+0x120/0x1d0 [iscsi_target_mod] May 7 23:26:16 roc-4r-scd212 kernel: [129694.169257] [<ffffffff810c0630>] ? prepare_to_wait_event+0x100/0x100 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169266] [<ffffffffc05c8570>] ? iscsit_thread_get_cpumask+0xc0/0xc0 [iscsi_target_mod] May 7 23:26:16 roc-4r-scd212 kernel: [129694.169270] [<ffffffff8109cdc9>] kthread+0xc9/0xe0 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169274] [<ffffffff8109cd00>] ? flush_kthread_worker+0x90/0x90 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169277] [<ffffffff81803fe2>] ret_from_fork+0x42/0x70 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169280] [<ffffffff8109cd00>] ? flush_kthread_worker+0x90/0x90 May 7 23:26:16 roc-4r-scd212 kernel: [129694.169281] Code: 1f 80 00 00 00 00 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 f3 0f 1f 44 00 00 66 83 07 02 48 89 df 57 9d <0f> 1f 44 00 00 5b 5d c3 0f 1f 44 00 00 b8 02 00 00 00 f0 66 0f On Wed, May 6, 2015 at 12:21 PM, Robert Wood <rwood@xxxxxxxxxxxxxxxxxxx> wrote: > One update: it appears that Vmware ESXi 5.5 U2 is sometimes > incorrectly sensing that the LIO-ORG device is an SSD. I wonder if > that is causing issues with commands being sent? I am testing > untagging all LIO-ORG devices as SSD to see if the problem recurs. > > > > On Wed, May 6, 2015 at 11:17 AM, Robert Wood <rwood@xxxxxxxxxxxxxxxxxxx> wrote: >> Good morning, we are continuing to receive: >> >> May 6 11:08:26 roc-4r-scd212 kernel: [71898.566185] ABORT_TASK: >> ref_tag: 3847728 already complete, skipping >> May 6 11:08:26 roc-4r-scd212 kernel: [71898.566187] ABORT_TASK: >> Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 3847728 >> May 6 11:08:26 roc-4r-scd212 kernel: [71898.566191] Unable to locate >> ITT: 0x003ab632 on CID: 0 >> May 6 11:08:26 roc-4r-scd212 kernel: [71898.566191] Unable to locate >> RefTaskTag: 0x003ab632 on CID: 0. >> May 6 11:08:26 roc-4r-scd212 kernel: [71898.566254] Unexpected ret: >> -32 send data 48 >> May 6 11:08:30 roc-4r-scd212 kernel: [71902.397053] libceph: osd20 >> 10.80.3.25:6812 socket closed (con state OPEN) -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html