Hi Mike, On Tue, Jun 2, 2015 at 2:45 PM, Mike Christie <mchristi@xxxxxxxxxx> wrote: > On 06/02/2015 08:11 AM, Alex Gorbachev wrote: >> Hi Mike, >> >> Thank you for your responses. I have sent last night's logs in a >> separate email. Basically, in absence of any ceph related warnings, >> there seems to be a large set of ESXi rescans, then a message such as: >> >> Jun 1 22:57:03 roc-4r-scd212 kernel: [531823.664766] iSCSI Login >> timeout on Network Portal 10.70.2.211:3260 > > Yeah, it looks like some commands timed out, esx tried to > abort and that failed. It then escalated to session level recovery and > it could not relogin, because the login code was waiting for the > outstanding tasks as we see this: Would you say there is no graceful mechanism for LIO and ESXi to just give up on one login and try another one? > > > Jun 1 22:56:56 roc-4r-scd212 kernel: [531817.070242] Unexpected ret: -32 send data 48 > Jun 1 22:56:57 roc-4r-scd212 kernel: [531817.326345] Unexpected ret: -32 send data 48 > Jun 1 22:57:01 roc-4r-scd212 kernel: [531821.680011] Unexpected ret: -32 send data 48 > Jun 1 22:57:02 roc-4r-scd212 kernel: [531822.384283] Unexpected ret: -32 send data 48 > Jun 1 22:57:03 roc-4r-scd212 kernel: [531823.376666] Unexpected ret: -32 send data 48 > Jun 1 22:57:03 roc-4r-scd212 kernel: [531823.664766] iSCSI Login timeout on Network Portal 10.70.2.211:3260 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373632] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [iscsi_trx:23174] > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373680] Modules linked in: rbd libceph libcrc32c iscsi_target_mod target_core_file target_core_pscsi target_core_iblock target_core_mod configfs xt_multiport iptable_filter ip_tables x_tables enhanceio_rand(OE) enhanceio_lru(OE) enhanceio_fifo(OE) enhanceio(OE) ipmi_devintf ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev sb_edac edac_core mei_me mei lpc_ich ioatdma ipmi_si ipmi_msghandler 8250_fintek 8021q garp mrp stp shpchp llc wmi bonding mac_hid lp parport mlx4_en vxlan ip6_udp_tunnel udp_tunnel hid_generic igb usbhid mpt2sas ahci hid i2c_algo_bit mlx4_core dca libahci raid_class ptp scsi_transport_sas pps_core > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373742] CPU: 4 PID: 23174 Comm: iscsi_trx Tainted: G C OE 4.1.0-040100rc2-generic #201505032335 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373744] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373747] task: ffff88105912e4a0 ti: ffff880844a44000 task.ti: ffff880844a44000 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373749] RIP: 0010:[<ffffffff8180379d>] [<ffffffff8180379d>] _raw_spin_lock_bh+0x1d/0x70 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373758] RSP: 0018:ffff880844a47d08 EFLAGS: 00000286 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373760] RAX: 00000000abc6abc6 RBX: ffff8808446b5410 RCX: 00000000f7b8aa20 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373762] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8808446b5224 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373764] RBP: ffff880844a47d08 R08: ffff8808446b5410 R09: 0000000000000101 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373766] R10: 0000000000000004 R11: dead000000200200 R12: 0000000000000004 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373768] R13: dead000000200200 R14: 0000000000000004 R15: ffff880844a47ce8 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373770] FS: 0000000000000000(0000) GS:ffff88085fb00000(0000) knlGS:0000000000000000 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373772] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373774] CR2: 00007f19f67ff2d0 CR3: 0000000001e0f000 CR4: 00000000001407e0 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373776] Stack: > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373778] ffff880844a47d28 ffffffffc04194c7 ffff8808446b5140 ffff88085bfb7000 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373781] ffff880844a47d58 ffffffffc0424665 0000000064d064d0 ffff8808446b5140 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373784] 0000000000000001 0000000000000001 ffff880844a47d98 ffffffffc042473e > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373788] Call Trace: > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373804] [<ffffffffc04194c7>] iscsit_stop_dataout_timer+0x27/0x90 [iscsi_target_mod] > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373813] [<ffffffffc0424665>] __iscsit_free_cmd+0x85/0xa0 [iscsi_target_mod] > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373821] [<ffffffffc042473e>] iscsit_free_cmd+0xbe/0x160 [iscsi_target_mod] > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373829] [<ffffffffc042d8dc>] iscsit_close_connection+0x47c/0x770 [iscsi_target_mod] > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373837] [<ffffffffc0418c83>] iscsit_take_action_for_connection_exit+0x83/0x110 [iscsi_target_mod] > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373845] [<ffffffffc042c947>] iscsi_target_rx_thread+0x207/0x2b0 [iscsi_target_mod] > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373848] [<ffffffff817ff4fe>] ? __schedule+0x38e/0x8db > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373856] [<ffffffffc042c740>] ? iscsi_target_tx_thread+0x1d0/0x1d0 [iscsi_target_mod] > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373861] [<ffffffff8109cdc9>] kthread+0xc9/0xe0 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373865] [<ffffffff8109cd00>] ? flush_kthread_worker+0x90/0x90 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373868] [<ffffffff81803fe2>] ret_from_fork+0x42/0x70 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373871] [<ffffffff8109cd00>] ? flush_kthread_worker+0x90/0x90 > Jun 1 22:57:16 roc-4r-scd212 kernel: [531836.373872] Code: 00 00 00 00 eb ce 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 65 81 05 0f 81 80 7e 00 02 00 00 48 89 e5 b8 00 00 02 00 f0 0f c1 07 <89> c2 c1 ea 10 66 39 c2 75 02 5d c3 0f b7 ca b8 00 80 00 00 44 > > > >> >> Then more errors. I have attached the syslog from the LIO node, as >> well as logs from one of the six ESXi nodes, but this one has the most >> storage. >> >> Do you think there is an ESXi scan storm or such that breaks LIO? Or > > How are you detecting scans in the logs? The non existent lun messages in > the logs right? If so, are the scans before or after the aborts? I thought it was > after. In the logs I was looking at the issue at around Jun 1 22:41:23, so > I'm not sure if there was a event before the log started or not. Right, the nonexistent LUN. There are scans before and after the aborts (I sent you the logs via email from yesterday that show a lot of these). > >> is this still a ceph related issue? I did change the osd op complaint >> time as you indicated, it was set to 30 and I set it to 10. >> > > Ok, and nothing in the osd logs (on the osd boxes that would be in > /var/log/ceph/)? I found some oddities, I realize this is ceph related so will follow up appropriately: 2015-06-02 08:50:58.263988 7f3ca1c85700 10 -- 10.80.4.20:6812/3546 >> 10.80.3.25:0/4891 pipe(0xd543900 sd=241 :6812 s=4 pgs=265 cs=1 l=1 c=0x9547860).writer done 2015-06-02 08:50:58.264487 7f3ca42d9700 2 -- 10.80.3.20:6818/3546 >> 10.80.4.25:0/4933 pipe(0xda19c00 sd=26 :6818 s=2 pgs=242 cs=1 l=1 c=0x7382160).reader couldn't read tag, (0) Success 2015-06-02 08:50:58.264513 7f3ca42d9700 2 -- 10.80.3.20:6818/3546 >> 10.80.4.25:0/4933 pipe(0xda19c00 sd=26 :6818 s=2 pgs=242 cs=1 l=1 c=0x7382160).fault (0) Success 2015-06-02 08:50:58.264532 7f3ca42d9700 10 -- 10.80.3.20:6818/3546 >> 10.80.4.25:0/4933 pipe(0xda19c00 sd=26 :6818 s=2 pgs=242 cs=1 l=1 c=0x7382160).fault on lossy channel, failing 2015-06-02 08:50:58.264540 7f3ca42d9700 10 -- 10.80.3.20:6818/3546 >> 10.80.4.25:0/4933 pipe(0xda19c00 sd=26 :6818 s=2 pgs=242 cs=1 l=1 c=0x7382160).stop 2015-06-02 08:50:58.264549 7f3ca42d9700 10 -- 10.80.3.20:6818/3546 >> 10.80.4.25:0/4933 pipe(0xda19c00 sd=26 :6818 s=4 pgs=242 cs=1 l=1 c=0x7382160).unregister_pipe > I cannot remember, did you say you were not using enhance io for this > (it was just loaded but not used for lio) or did you try with it off? EnhanceIO is installed, but I am not currently using any caches on any volumes. > > Can we replicate this in a non production box? I can send you a patch with > a bunch of extra debugging on. Unfortunately no, the non production box dos not see these issues. Thank you, Alex > > > >> Thank you, >> Alex >> >> On Mon, Jun 1, 2015 at 10:55 PM, Mike Christie <mchristi@xxxxxxxxxx> wrote: >>> On 06/01/2015 09:18 AM, Alex Gorbachev wrote: >>>> Hi Mike, >>>> >>>> On Fri, May 29, 2015 at 3:11 PM, Mike Christie <mchristi@xxxxxxxxxx> wrote: >>>>> Just want to make sure we are on the same page. The logs you have been >>>>> posting the last week do not show crashes do they (there are no null >>>>> pointers or kernel lock ups)? They are just showing error handling >>>>> issues. I just want to make sure you are no longer seeing the crashes >>>>> with the newer kernels like we saw before. >>>> >>>> That is true, it may be due to the newer kernels. We also implemented >>>> a fast monitoring system, which alerts us whenever there is an issue - >>>> so that may play a factor as we would proactively reboot or fail over >>>> services. I did notice that after a few timeouts the system becomes >>>> sluggish or unresponsive, and to clear that it takes a reboot. The >>>> good news is that we are seeing fewer timeouts overall (none last >>>> week). >>>> >>>>> For the timeouts/abort_tasks, the issue is either that LIO box is locked >>>>> up in some way that RBD IO is not executing like we saw with the >>>>> workqueue issues in older kernels, or there is just some issue on the >>>>> OSD boxes or connection to them that is not allowing IO to execute like >>>>> normal. This is the normal case and for this, you need to post the logs >>>>> on the ceph list if there is anything suspicious. >>>> >>>> Is there any way to tell which backstore device is causing the >>>> timeouts by the LIO messages? Then we could map it to RBD and then >>>> OSD and PG. As it is right now, ceph logs are not showing any errors >>>> when these events happen. >>> >>> On the LIO box you could try turning on more debugging, but if you just >>> look on the ESXi box vmkernel log you have the info. >>> >>> On the ESXi box in the logs you see error messages with the device id like: >>> >>> naa.6001405a37e54d400000000000000000 >>> >>> or something with the host, channel, target lun identifier like >>> >>> iscsi_vmk: >>> iscsivmk_TaskMgmtIssue: vmhba33:CH:3 T:22 L:1 : Task mgmt "Abort Task" >>> with itt=0x10d4450 (refITT=0x10d444f) timed ou >>> >>> You just then just map that to the device on the LIO box. So for >>> example, if you know the LUN (1 in the example above), then just look at >>> the LIO target config and see what rbd device you exported as lun 1. >>> >>> What is your value for: >>> >>> osd op complaint time >>> >>> Lower it to around 10 or 15 secs, so it is lower than the ESXi timeout >>> that would result in abort tasks being sent. >> -- >> To unsubscribe from this list: send the line "unsubscribe target-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html