Re: Connectivity problems with ISCSI target and ESXi server(s)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 3, 2014 at 10:37 PM, Nicholas A. Bellinger
<nab@xxxxxxxxxxxxxxx> wrote:
> On Sat, 2014-05-31 at 13:03 +0300, Charalampos Pournaris wrote:
>> On Sun, May 25, 2014 at 9:42 AM, Thomas Glanzmann <thomas@xxxxxxxxxxxx> wrote:
>> > Hallo Nab,
>> >
>> >> Usually spreading out the LUNs across multiple endpoints (eg: multiple
>> >> IQNs) helps to reduce or completely eliminate the ESX host side issues.
>> >> Typically you want to keep it around 4-5 LUNs per endpoint on 1 Gb/sec
>> >> ports.
>> >
>> > in my case I had 3 LUNs per endpoint. With rts/os with the same config
>> > it worked. But as soon as my time permits it, I'll hunt it down.
>> >
>> > Cheers,
>> >         Thomas
>>
>> Hi again,
>>
>> I managed to enable dynamic debugging on the Debian VMs as Thomas
>> suggested and I've got some logs as the problem manifested itself
>> again. More specifically, I started seeing datastores losing
>> connection to the attached hosts and at some point the VM that hosts
>> the iSCSI target crashed with a kernel panic.
>>
>> I've got an enormous amount of logs but I'm not quite sure which ones
>> would more helpful to you: kern.log, debug, syslog, messages or all of
>> them? All of them gziped have a size of around 700 mb (unzipped each
>> one is around 2 GB). Would you be able to have a look at such big
>> files? (I guess by targeting some specific output)
>>
>> By the way, I had also created multiple IQNs (one for each host
>> accessing the LUN) as per Nicholas' suggestion (if I got it right). It
>> can be seen below:
>>
>
> <SNIP>
>
> The configuration looks fine.  Thanks for including the extra info..
>
>>
>> If for some reason the formatting is not displayed properly, or for
>> better readability check this screenshot:
>> http://postimg.org/image/e358ov40r/full/
>>
>> Thank you in advance for your help.
>>
>
> Ok, so looking at these logs it's apparent that there are significantly
> fewer occurrences of ABORT_TASK.  In fact, AFAICT there is only a single
> occurrence of ABORT_TASK in the entire log.
>
> This could be attributed to the reconfiguration to use a single LUN per
> endpoint, to avoid the false positive timeout issues that ESX is known
> to generate with multiple LUNs per TargetName+TargetPortalGroupTag
> endpoint..
>
> However, looking at the single instance of ABORT_TASK in the log,
> something else appears to be happening with your backend:
>
> May 30 08:54:58 sof-24378-iscsi-vm kernel: [105260.032235] Got Task Management Request ITT: 0x0027a82c, CmdSN: 0x3da72700, Function: 0x01, RefTaskTag: 0x0027a814, RefCmdSN: 0x25a72700, CID: 0
> May 30 08:54:58 sof-24378-iscsi-vm kernel: [105260.032266] ABORT_TASK: Found referenced iSCSI task_tag: 2598932
> May 30 08:54:58 sof-24378-iscsi-vm kernel: [105260.032271] wait_for_tasks: Stopping ffff8800bac15810 ITT: 0x0027a814 i_state: 6, t_state: 5, CMD_T_STOP
>
> The most interesting line is the last one wrt to wait_for_tasks..
>
> Decoded, these i_state and t_state values mean:
>
>     i_state: 6 (ISTATE_RECEIVED_LAST_DATAOUT)
>     t_state: 5 (TRANSPORT_PROCESSING)
>
> The significance of the 'TRANSPORT_PROCESSING' t_state means that an I/O
> request was dispatched to the backend (iblock/24378_iscsi), but the
> underlying storage never completes the outstanding I/O back to the
> target layer.  Or at least, this occurrence of ABORT_TASK is right near
> the end of the logs, and there is no debug output to indicate the I/O
> completion ever occurs.
>
> This usually means some type of problem with the underlying driver for
> the backend storage, as there is not a legitimate why outstanding I/Os
> would not be (eventually) completed back to IBLOCK, be it with GOOD or
> some manner of exception status.
>
> So that said, I would start investigating the underlying LLD driver for
> iblock/24378_iscsi (/dev/sdb)..  What type of storage + LLD is it
> using..?  Is the HBA using the latest available firmware..?  Is there
> anything else special about this backend..?
>
> --nab
>

Hi Nicholas,

First of all thanks for the detailed explanation. It seems that this
last problem that we've hit is different from the one reported
initially since it caused a kernel panic whereas the other issue makes
the datastore(s) inactive, constantly throwing a login error. By the
way, I'm using linux VMs (debian) to expose the iSCSI datastores (i.e.
/dev/sdb is a local drive) and thus it doesn't use any special
hardware/firmware. The only possible issue I can think of is the
vmware-tools driver might be the culprit for the incomplete I/O, or
some kernel driver?

I've hit again the issue initially reported and have fresh logs to
share. I'll send a separate mail with a link to the new logs.
Additionally, when I attempted to stop the target service, it got stuck:

------------------------------
--------------------------------------------------------------------------------
root@sof-24379-iscsi-vm:/var/log# /etc/init.d/target stop
Unloading fabric/configfs: Successfully released fabric:
/sys/kernel/config/target/srpt
Successfully released fabric: /sys/kernel/config/target/qla2xxx
Successfully released fabric: /sys/kernel/config/target/loopback
Successfully released fabric: /sys/kernel/config/target/fc
  [OK]
Unloading LIO-Target/ConfigFS fabric: Successfully released network
portal: 10.23.84.25:3260 created
iqn.2003-01.org.linux-iscsi.sof-24379-iscsi-vm.x8664:sn.6d8c93856c59
TPGT: 1
Successfully deleted iSCSI Initiator Mapped LUN: 0 ACL
iqn.1998-01.com.vmware:sof-24378-srv-4e00cca4 for iSCSI Target Portal
Group: iqn.2003-01.org.linux-iscsi.sof-24379-iscsi-vm.x8664:sn.6d8c93856c59
1
Successfully deleted iSCSI Initaitor ACL
iqn.1998-01.com.vmware:sof-24378-srv-4e00cca4 for iSCSI Target Portal
Group: iqn.2003-01.org.linux-iscsi.sof-24379-iscsi-vm.x8664:sn.6d8c93856c59
1
Successfully deleted iSCSI Target Logical Unit
Successfully released iSCSI Target Portal Group:
iqn.2003-01.org.linux-iscsi.sof-24379-iscsi-vm.x8664:sn.6d8c93856c59
TPGT: 1
Successfully released iSCSI Target Endpoint IQN:
iqn.2003-01.org.linux-iscsi.sof-24379-iscsi-vm.x8664:sn.6d8c93856c59
<stuck at this point>
--------------------------------------------------------------------------------------------------------------

Below you can see some of the ABORT tasks in these new logs:

--------------------------------------------------------------------------------------------------------------
....
Jun 10 11:15:40 sof-24379-iscsi-vm kernel: [843336.907166] ABORT_TASK:
Found referenced iSCSI task_tag: 7383010
Jun 10 11:15:40 sof-24379-iscsi-vm kernel: [843337.045742] ABORT_TASK:
Sending TMR_FUNCTION_COMPLETE for ref_tag: 7383010
Jun 10 11:16:09 sof-24379-iscsi-vm kernel: [843366.085476] ABORT_TASK:
Found referenced iSCSI task_tag: 14376053
Jun 10 11:16:09 sof-24379-iscsi-vm kernel: [843366.114603] ABORT_TASK:
Sending TMR_FUNCTION_COMPLETE for ref_tag: 14376053
Jun 10 11:16:09 sof-24379-iscsi-vm kernel: [843366.114637] ABORT_TASK:
Found referenced iSCSI task_tag: 14376065
Jun 10 11:16:09 sof-24379-iscsi-vm kernel: [843366.114640] ABORT_TASK:
ref_tag: 14376065 already complete, skipping
Jun 10 11:16:09 sof-24379-iscsi-vm kernel: [843366.114641] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 14376065
Jun 10 11:16:10 sof-24379-iscsi-vm kernel: [843367.116890] ABORT_TASK:
Found referenced iSCSI task_tag: 18861488
Jun 10 11:16:10 sof-24379-iscsi-vm kernel: [843367.116892] ABORT_TASK:
ref_tag: 18861488 already complete, skipping
Jun 10 11:16:10 sof-24379-iscsi-vm kernel: [843367.116895] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 18861488
Jun 10 11:16:10 sof-24379-iscsi-vm kernel: [843367.116903] ABORT_TASK:
Found referenced iSCSI task_tag: 18861490
Jun 10 11:16:10 sof-24379-iscsi-vm kernel: [843367.122593] ABORT_TASK:
Sending TMR_FUNCTION_COMPLETE for ref_tag: 18861490
Jun 10 11:16:13 sof-24379-iscsi-vm kernel: [843369.669737] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 7383497
Jun 10 11:16:13 sof-24379-iscsi-vm kernel: [843369.669747] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 7383481
Jun 10 11:16:13 sof-24379-iscsi-vm kernel: [843369.669752] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 7383486
Jun 10 11:17:12 sof-24379-iscsi-vm kernel: [843429.355248] ABORT_TASK:
Found referenced iSCSI task_tag: 7383515
Jun 10 11:17:12 sof-24379-iscsi-vm kernel: [843429.355255] ABORT_TASK:
Sending TMR_FUNCTION_COMPLETE for ref_tag: 7383515
Jun 10 11:17:12 sof-24379-iscsi-vm kernel: [843429.355259] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 7383515
Jun 10 11:17:12 sof-24379-iscsi-vm kernel: [843429.355261] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 7383515
Jun 10 11:26:43 sof-24379-iscsi-vm kernel: [844000.298874] ABORT_TASK:
Found referenced iSCSI task_tag: 18890483
Jun 10 11:26:43 sof-24379-iscsi-vm kernel: [844000.298883] ABORT_TASK:
Sending TMR_FUNCTION_COMPLETE for ref_tag: 18890483
Jun 10 11:26:43 sof-24379-iscsi-vm kernel: [844000.298888] ABORT_TASK:
Found referenced iSCSI task_tag: 18890486
Jun 10 11:26:43 sof-24379-iscsi-vm kernel: [844000.347128] ABORT_TASK:
Sending TMR_FUNCTION_COMPLETE for ref_tag: 18890486
Jun 10 11:26:43 sof-24379-iscsi-vm kernel: [844000.347134] ABORT_TASK:
Found referenced iSCSI task_tag: 18890491
Jun 10 11:26:43 sof-24379-iscsi-vm kernel: [844000.347136] ABORT_TASK:
ref_tag: 18890491 already complete, skipping
Jun 10 11:26:43 sof-24379-iscsi-vm kernel: [844000.347138] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 18890491
Jun 10 11:26:43 sof-24379-iscsi-vm kernel: [844000.347141] ABORT_TASK:
Found referenced iSCSI task_tag: 18890496
Jun 10 11:26:43 sof-24379-iscsi-vm kernel: [844000.347142] ABORT_TASK:
ref_tag: 18890496 already complete, skipping
Jun 10 11:26:43 sof-24379-iscsi-vm kernel: [844000.347144] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 18890496
...
--------------------------------------------------------------------------------------------------------------

Greetings,
Harry
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux