Re: Connectivity problems with ISCSI target and ESXi server(s)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Harry,

Re-adding the linux-scsi CC, minus your log link.

On Thu, 2014-06-12 at 01:29 +0300, Charalampos Pournaris wrote:
> Hi Nicholas,
> 
> I will apply the patches as soon as possible, thank you for your prompt 
> response. It's a production environment and thus I cannot recompile the 
> kernel on demand, unfortunately.
> In the meantime, I've hit again a kernel panic in one of the iSCSI VMs. 
> I've collected the logs from that machine. Would you mind taking a look 
> if you find some time, it'd be interesting to find out whether it's a 
> problem with a driver or if it derives from iSCSI code? Again, the 
> kernel didn't have the patches provided in your last reply at the time 
> of the failure.
> 
> Here's the link:

<SNIP>

> sha1sum: 557f45b17316b4d4651dd010ea85b00a6c034e64
> 

So the same thing is happening as before:

Jun 11 16:06:35 sof-24381-iscsi-vm kernel: [795215.637642] ABORT_TASK: Found referenced iSCSI task_tag: 19042955
Jun 11 16:06:35 sof-24381-iscsi-vm kernel: [795215.637647] wait_for_tasks: Stopping ffff8801ed4e8350 ITT: 0x0122928b i_state: 6, t_state: 5, CMD_T_STOP

An ABORT_TASK is received for a I/O that's in TRANSPORT_PROCESSING state.

Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.212926] INFO: task kworker/u4:0:3539 blocked for more than 120 seconds.
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.213862]       Not tainted 3.14.4-ddebug-on #1
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.214580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215882] kworker/u4:0    D ffff880236d46468     0  3539      2 0x00000000
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215931] Workqueue: tmr-iblock target_tmr_work [target_core_mod]
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215934]  ffff880236d46050 0000000000000046 0000000000014380 0000000000014380
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215936]  ffff8800bafa1fd8 ffff880236d46050 ffff8801ed4e8480 ffff8800bafa1d40
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215938]  ffff8801ed4e8478 ffff880236d46050 0000000000000005 000000000122928b
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215940] Call Trace:
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215968]  [<ffffffff814c60c9>] ? schedule_timeout+0x209/0x2a0
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215980]  [<ffffffff812a34c9>] ? __dynamic_pr_debug+0x79/0x80
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215984]  [<ffffffff814c7650>] ? wait_for_completion+0xa0/0x110
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215999]  [<ffffffff8108f680>] ? wake_up_state+0x10/0x10
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216004]  [<ffffffffa03d2950>] ? transport_wait_for_tasks+0x60/0x130 [target_core_mod]
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216009]  [<ffffffffa03cfe4e>] ? core_tmr_abort_task+0x14e/0x220 [target_core_mod]
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216013]  [<ffffffffa03d2af7>] ? target_tmr_work+0xa7/0xc0 [target_core_mod]
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216024]  [<ffffffff81079c2d>] ? process_one_work+0x16d/0x420
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216027]  [<ffffffff8107a7e6>] ? worker_thread+0x116/0x3b0
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216029]  [<ffffffff8107a6d0>] ? rescuer_thread+0x330/0x330
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216032]  [<ffffffff81080a71>] ? kthread+0xc1/0xe0
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216034]  [<ffffffff810809b0>] ? kthread_create_on_node+0x180/0x180
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216039]  [<ffffffff814d1d8c>] ? ret_from_fork+0x7c/0xb0
Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216041]  [<ffffffff810809b0>] ? kthread_create_on_node+0x180/0x180

And the ABORT_TASK waits for this I/O to come back, but according to the
log it never completes from the backend + underlying LLD, which
eventually causes the hung task logic kicks in.  The last time-stamp in
the log is 16:18:17, which means between the ABORT_TASK at 16:06:35 +
16:18:17, ~12 minutes have elapsed with no response from the underlying
backend storage for the I/O in question.

The three patches mentioned previously will address the case where an
ABORT_TASK hangs indefinitely when an session reset alow occurs, but
AFAICT no session reset is occurring in your logs, so your not hitting
this specific bug.

So based on the evidence, it still appears that your backend is not
(ever..?) returning outstanding I/Os to IBLOCK, which means there is
some type of bug in the backend storage outside of the target layer.

At some point a real OOPs does occur, and not just the hung_task
warnings above.  This OOPs is not present in the kern.log out, most
likely because it involved the underlying storage driver.

That said, I'd strongly recommend attaching a virtual serial console to
the vmware guest, and figure out where the real OOPs is occurring by
redirecting console output with console=ttyS0.

My guess at this point is that it's something related to the VMware
PVSCSI driver.

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux