Re: Connectivity problems with ISCSI target and ESXi server(s)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Charalampos,

On Tue, 2014-05-20 at 15:24 +0300, Charalampos Pournaris wrote:
> Hi,
> 
> This is my first mail here so please accept my apologies if this the
> wrong mailing list.
> 
> I have the following setup on which I'm observing connectivity issues
> with an ISCSI target:
> 
> 4 ESXi servers (5.5)
> 1 ISCSI target implementation based on targetcli (Linux xxx-iscsi-vm
> 3.13-1-amd64 #1 SMP Debian 3.13.10-1 (2014-04-15) x86_64 GNU/Linux)
> 
> The ESXi servers are connected to the ISCSI target via a dedicated
> network (JUMBO frames enabled).
> 
> After the initial setup everything seemed to be working fine, the
> device was recognized by the ESX servers and I was able deploy/create
> VMs on the VMFS formatted ISCSI datastore. However, after using the
> setup for a while (eg. after a few days) the hosts started losing
> connectivity to ISCSI and the device now shows as inactive (Dead or
> Error state) in the ESXs.
> 
> From the ISCSI side, using dmesg I get messages similar to the following:
> 
> May 20 16:08:45 sof-24378-iscsi-vm kernel: [419952.703493] iSCSI Login
> negotiation failed.
> May 20 16:09:00 sof-24378-iscsi-vm kernel: [419967.751926] iSCSI Login
> timeout on Network Portal 10.23.84.24:3260
> May 20 16:09:00 sof-24378-iscsi-vm kernel: [419967.753559] tx_data
> returned -32, expecting 48.
> May 20 16:09:00 sof-24378-iscsi-vm kernel: [419967.754950] iSCSI Login
> negotiation failed.
> May 20 16:09:00 sof-24378-iscsi-vm kernel: [419967.756498] rx_data
> returned -104, expecting 48.
> May 20 16:09:00 sof-24378-iscsi-vm kernel: [419967.757941] iSCSI Login
> negotiation failed.
> May 20 16:09:15 sof-24378-iscsi-vm kernel: [419982.803691] iSCSI Login
> timeout on Network Portal 10.23.84.24:3260
> May 20 16:09:15 sof-24378-iscsi-vm kernel: [419982.805507] tx_data
> returned -32, expecting 48.
> May 20 16:09:15 sof-24378-iscsi-vm kernel: [419982.806948] iSCSI Login
> negotiation failed.
> 
> You can find the full log in the link below:
> 
> http://pastebin.com/AqqJaYVX
> 
> Could you please provide some help on identifying whether this
> problems comes from the ESX side or ISCSI target and possible
> solutions/workarounds?
> 
> Many thanks for your time and help!
> 

So there are two issues going on here.

First, after repeated ABORT_TASKs (eg: timeouts on the ESX host), it
appears your able to hit a bug in TMR logic where an ABORT_TASK response
never gets sent.  This prevents session reinstatement attempts from
being able to service new logins, resulting in the repeated login
negotiation failures reported above.

Second, based upon the full logs the number of continuous ABORT_TASKs
that are generated indicates some form of network connectivity issue,
and/or a backend device that is taking long amounts of time to respond.

Along with the comments from Thomas, to further isolate the problem
space for the second item I'd recommend disabling jumbo frames on both
ends + switch, as well as disabling DelayedACK on the ESX host side.

Also, another quick way to verify if the connectivity issue is jumbo
frames related is to vmkping using different packet sizes as described
here:

https://communities.vmware.com/thread/459101

So I'd recommend confirming the networking related items first, and then
look at potential latency issues with your backend storage based upon
heavy load.

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux