Re: iSCSI login failure, possible race condition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Maged,

On Mon, 2016-02-01 at 13:24 -0800, Maged Mokhtar wrote:
> Hello,
> 
> I am seeing frequent "iSCSI Login negotiation failed." when trying to
> connect  a Microsoft client initiator running inside a virtual machine
> to a lio target running in another virtual machine under VMWare.
> It happens in kernels 3.12, 3.16 and 3.19. It does not happen in older
> kernels 3.8 and 3.10.
> 
> I did some tracing and found it is related to the changes from
> PATCH-v3 0/5 9 Sep 23:38 2013 "Add support for login multi-plexing
> support" .
> There seems to be a race condition that happen in the newer
> iscsi_target_nego.c code:
> 
> The successul logins happen when:
> iscsi_target_start_negotiation() calls iscsi_target_do_login() which
> returns 0, iscsi_target_start_negotiation()  sets the
> LOGIN_FLAGS_READY.
> iscsi_target_sk_data_ready() callback is received, finds the
> LOGIN_FLAGS_READY flag set and proceed with calling
> schedule_delayed_work() to handle further negotiation
> 
> The failed logins happen when
> iscsi_target_start_negotiation() calls iscsi_target_do_login(), but
> before the later returns, iscsi_target_sk_data_ready() callback is
> received and finds the LOGIN_FLAGS_READY flag not set and exits
> without calling schedule_delayed_work(). Later iscsi_target_do_login()
> returns 0 and scsi_target_start_negotiation()  sets the
> LOGIN_FLAGS_READY but it is too late.

Note struct sock->sk_data_ready() -> iscsi_target_sk_data_ready() is
invoked by net/ipv4/tcp_input.c code anytime data is ready to be
received.

The fact LOGIN_FLAGS_READY is set in iscsi_target_start_negotiation()
after iscsi_target_do_login() returns doesn't make a difference, because
iscsi_target_sk_data_ready() will keep getting called until payload is
pulled out of the socket's receive buffer with sock_recvmsg().

> 
> The VMWare environment could be a factor, specially that the iscsi
> initiator client and the lio target are 2 different VMs but runing
> inside the same physical VMWare host. The data ready callback could be
> quicker in this case than if they were 2 real/non-vm  machines. I
> tried to put different lan speed settings in VMWare to delay this
> callback with no success.
> Still i believe this VMWare environment should be supported. I am
> surprised no one has seen this before, i can reproduce it on different
> machines and tried different versions as per above.
> 
> Is there an easy fix for the above issue please ?

So from first glance, it does sound like something specific to the type
of NIC emulation within the VMware guest.

Please confirm what NIC emulation your using, and using a different type
of NIC emulation has any effect.

Thanks for reporting.

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux