On Wed, 2014-06-11 at 18:34 -0700, Nicholas A. Bellinger wrote: > Hi Harry, > > Re-adding the linux-scsi CC, minus your log link. > > On Thu, 2014-06-12 at 01:29 +0300, Charalampos Pournaris wrote: > > Hi Nicholas, > > > > I will apply the patches as soon as possible, thank you for your prompt > > response. It's a production environment and thus I cannot recompile the > > kernel on demand, unfortunately. > > In the meantime, I've hit again a kernel panic in one of the iSCSI VMs. > > I've collected the logs from that machine. Would you mind taking a look > > if you find some time, it'd be interesting to find out whether it's a > > problem with a driver or if it derives from iSCSI code? Again, the > > kernel didn't have the patches provided in your last reply at the time > > of the failure. > > > > Here's the link: > > <SNIP> > > > sha1sum: 557f45b17316b4d4651dd010ea85b00a6c034e64 > > > > So the same thing is happening as before: > > Jun 11 16:06:35 sof-24381-iscsi-vm kernel: [795215.637642] ABORT_TASK: Found referenced iSCSI task_tag: 19042955 > Jun 11 16:06:35 sof-24381-iscsi-vm kernel: [795215.637647] wait_for_tasks: Stopping ffff8801ed4e8350 ITT: 0x0122928b i_state: 6, t_state: 5, CMD_T_STOP > > An ABORT_TASK is received for a I/O that's in TRANSPORT_PROCESSING state. > > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.212926] INFO: task kworker/u4:0:3539 blocked for more than 120 seconds. > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.213862] Not tainted 3.14.4-ddebug-on #1 > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.214580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215882] kworker/u4:0 D ffff880236d46468 0 3539 2 0x00000000 > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215931] Workqueue: tmr-iblock target_tmr_work [target_core_mod] > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215934] ffff880236d46050 0000000000000046 0000000000014380 0000000000014380 > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215936] ffff8800bafa1fd8 ffff880236d46050 ffff8801ed4e8480 ffff8800bafa1d40 > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215938] ffff8801ed4e8478 ffff880236d46050 0000000000000005 000000000122928b > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215940] Call Trace: > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215968] [<ffffffff814c60c9>] ? schedule_timeout+0x209/0x2a0 > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215980] [<ffffffff812a34c9>] ? __dynamic_pr_debug+0x79/0x80 > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215984] [<ffffffff814c7650>] ? wait_for_completion+0xa0/0x110 > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.215999] [<ffffffff8108f680>] ? wake_up_state+0x10/0x10 > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216004] [<ffffffffa03d2950>] ? transport_wait_for_tasks+0x60/0x130 [target_core_mod] > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216009] [<ffffffffa03cfe4e>] ? core_tmr_abort_task+0x14e/0x220 [target_core_mod] > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216013] [<ffffffffa03d2af7>] ? target_tmr_work+0xa7/0xc0 [target_core_mod] > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216024] [<ffffffff81079c2d>] ? process_one_work+0x16d/0x420 > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216027] [<ffffffff8107a7e6>] ? worker_thread+0x116/0x3b0 > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216029] [<ffffffff8107a6d0>] ? rescuer_thread+0x330/0x330 > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216032] [<ffffffff81080a71>] ? kthread+0xc1/0xe0 > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216034] [<ffffffff810809b0>] ? kthread_create_on_node+0x180/0x180 > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216039] [<ffffffff814d1d8c>] ? ret_from_fork+0x7c/0xb0 > Jun 11 16:09:21 sof-24381-iscsi-vm kernel: [795382.216041] [<ffffffff810809b0>] ? kthread_create_on_node+0x180/0x180 > > And the ABORT_TASK waits for this I/O to come back, but according to the > log it never completes from the backend + underlying LLD, which > eventually causes the hung task logic kicks in. The last time-stamp in > the log is 16:18:17, which means between the ABORT_TASK at 16:06:35 + > 16:18:17, ~12 minutes have elapsed with no response from the underlying > backend storage for the I/O in question. > > The three patches mentioned previously will address the case where an > ABORT_TASK hangs indefinitely when an session reset alow occurs, but > AFAICT no session reset is occurring in your logs, so your not hitting > this specific bug. > > So based on the evidence, it still appears that your backend is not > (ever..?) returning outstanding I/Os to IBLOCK, which means there is > some type of bug in the backend storage outside of the target layer. > > At some point a real OOPs does occur, and not just the hung_task > warnings above. This OOPs is not present in the kern.log out, most > likely because it involved the underlying storage driver. > > That said, I'd strongly recommend attaching a virtual serial console to > the vmware guest, and figure out where the real OOPs is occurring by > redirecting console output with console=ttyS0. > > My guess at this point is that it's something related to the VMware > PVSCSI driver. > And yes, I'd still recommend applying the three patches regardless. ;) --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html