Hi Maged, On Mon, 2016-02-01 at 13:24 -0800, Maged Mokhtar wrote: > Hello, > > I am seeing frequent "iSCSI Login negotiation failed." when trying to > connect a Microsoft client initiator running inside a virtual machine > to a lio target running in another virtual machine under VMWare. > It happens in kernels 3.12, 3.16 and 3.19. It does not happen in older > kernels 3.8 and 3.10. > > I did some tracing and found it is related to the changes from > PATCH-v3 0/5 9 Sep 23:38 2013 "Add support for login multi-plexing > support" . > There seems to be a race condition that happen in the newer > iscsi_target_nego.c code: > > The successul logins happen when: > iscsi_target_start_negotiation() calls iscsi_target_do_login() which > returns 0, iscsi_target_start_negotiation() sets the > LOGIN_FLAGS_READY. > iscsi_target_sk_data_ready() callback is received, finds the > LOGIN_FLAGS_READY flag set and proceed with calling > schedule_delayed_work() to handle further negotiation > > The failed logins happen when > iscsi_target_start_negotiation() calls iscsi_target_do_login(), but > before the later returns, iscsi_target_sk_data_ready() callback is > received and finds the LOGIN_FLAGS_READY flag not set and exits > without calling schedule_delayed_work(). Later iscsi_target_do_login() > returns 0 and scsi_target_start_negotiation() sets the > LOGIN_FLAGS_READY but it is too late. Note struct sock->sk_data_ready() -> iscsi_target_sk_data_ready() is invoked by net/ipv4/tcp_input.c code anytime data is ready to be received. The fact LOGIN_FLAGS_READY is set in iscsi_target_start_negotiation() after iscsi_target_do_login() returns doesn't make a difference, because iscsi_target_sk_data_ready() will keep getting called until payload is pulled out of the socket's receive buffer with sock_recvmsg(). > > The VMWare environment could be a factor, specially that the iscsi > initiator client and the lio target are 2 different VMs but runing > inside the same physical VMWare host. The data ready callback could be > quicker in this case than if they were 2 real/non-vm machines. I > tried to put different lan speed settings in VMWare to delay this > callback with no success. > Still i believe this VMWare environment should be supported. I am > surprised no one has seen this before, i can reproduce it on different > machines and tried different versions as per above. > > Is there an easy fix for the above issue please ? So from first glance, it does sound like something specific to the type of NIC emulation within the VMware guest. Please confirm what NIC emulation your using, and using a different type of NIC emulation has any effect. Thanks for reporting. --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html