Re: LUNs become unavailable with current git HEAD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Nab,

> If / when your able to reproduce, please make sure to enable the
> dynamic debugging for iscsi_target_mod after it's triggered to see
> what's going on..

I see, yesterday, I enabled the debugging before trying to triggering
the incident. I had 1G of syslog output.

> Also as mentioned earlier, the original logs indicate that the target
> was explicitly shutdown + modules unloaded (and not restarted) almost
> immediately after the ABORT_TASKs where received, and no other errors
> / exceptions where reported. You where able to confirm that shutdown
> and non restart was expected, right..?

I checked the logs. The first log message I can see is here:

Oct 11 11:53:56 node-62 kernel: [219465.151250] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5488

I started the evaluation at this exact time (I always download the evaluation
slides from my webserver directly before starting the evaluation because
they contain a password which is only valid for 60 minutes to do the
online evaluation of the class).
176.94.62.170 - - [11/Oct/2013:12:06:16 +0200]  "GET /xxx/xxx.pdf HTTP/1.1" 200 43168 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.69 Safari/537.36" "thomas.glanzmann.de"

Between 12:06:16 and 12:14:51 the participants must have complained about non
responding ESX servers, so I checked the serial console and the output. And I
remember seeing:

ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5488
and
TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x0000001d

So I wanted to have the target back as soon as possible, so I restarted it at
this exact time:

Oct 11 12:14:39 node-62 shutdown[9433]: shutting down for system reboot

Logs from the switch that the ports went down 12 seconds after I typed in
'reboot':

(infra) [/var/adm/syslog/2013/10/11] grep 'procurve-04 ports:  port 23' local6
Oct 11 12:14:51 procurve-04 ports:  port 23 in Trk10 is now off-line
Oct 11 12:14:53 procurve-04 ports:  port 23 is Blocked by LACP
Oct 11 12:14:59 procurve-04 ports:  port 23 in Trk10 is now on-line
Oct 11 12:15:03 procurve-04 ports:  port 23 in Trk10 is now off-line
Oct 11 12:15:07 procurve-04 ports:  port 23 is Blocked by LACP
Oct 11 12:15:44 procurve-04 ports:  port 23 in Trk10 is now off-line
Oct 11 12:15:49 procurve-04 ports:  port 23 is Blocked by LACP
Oct 11 12:15:50 procurve-04 ports:  port 23 in Trk10 is now on-line

What is bothering me is I not only see TMR_TASK_DOES_NOT_EXIST but also
'Detected NON_EXISTENT_LUN Access' so for me it looks like the target
forgot about the LUNs it had configuring _without_ me doing anything.

So the target was not started immediately but was sitting in the state
for around 20 minutes from 11:53:56 till 12:14:39. But I think it was
fail operational because if you loose access to all paths of a LUN (APD
(All Paths Down)) and work with an ESX server you notice immediately
because tasks don't do any progress anylonger and everything becomes
slugish (no response to commands given). Participants would have
complained to me earlier. We had configured multipathing in round robin
that means we had 12 targets; 5 devices; 20 paths (4 per device over 2
portals using two initiators). See also PDF page 28 labeled 'iscsi
multipathing' for the setup.

https://thomas.glanzmann.de/tmp/whiteboard.pdf

The two demo mode LUNs were on one target each. The 3 private to each
ESX servers LUNs were together on one target.

After the participants reported to me, that they had a problem I checked
dmesg and the serial console of the target and saw the ABORT_TASKs and
'NON_EXISTENT_LUN Access' and typed in 'reboot'. After the reboot
everything was back to normal.  However we left for lunch break and only
started working with the systems one hour later. But at that point all
the ongoing tasks in vCenter had timed out and we were able to continue
working.

Cheers,
        Thomas
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux