On Sat, 2014-05-17 at 07:44 +0000, Moussa Ba (moussaba) wrote: > > -----Original Message----- > > From: Nicholas A. Bellinger [mailto:nab@xxxxxxxxxxxxxxx] > > Sent: Friday, May 16, 2014 2:14 PM > > To: Moussa Ba (moussaba) > > Cc: Sagi Grimberg; target-devel@xxxxxxxxxxxxxxx; Nicholas Bellinger; Or > > Gerlitz; Jared Hulbert (jehulber); Yaron Haviv; roid@xxxxxxxxxxxx; Oren > > Duer > > Subject: Re: 3.12.5 Target Errors > > > > On Fri, 2014-05-16 at 05:54 +0000, Moussa Ba (moussaba) wrote: > > > I am able to connect to the target without issues using a centos > > > initiator. It logins fast and I can run read/write fio without issues > > > on the same target. Trying to that from esx 5.5 though results in > > > continuous connection drops...Is there something special about the > > esx > > > Initiator? I am running out of ideas. I see similar issues with tgt > > > where it completely fails to login. > > > > > > I am running out of ideas...Any suggestion is welcome. > > > > > > > > > Target: > > > FW: 2.30.8000 > > > Kernel:3.12.9+patches > > > ConnectX-3 cards are configured as Ethernet cards. > > > > > > Initiator: > > > FW 2.31.5050 (it was originally 2.11.500 but I upgraded it but > > failed > > > to see any difference still seeing the same error) > > > Driver: 1.9.10.0-1OEM.550.0.0.1331820 > > > Using iser mode > > > > > > > Just FYI, I've previously encountered some stability issues with > > ConnectX-3's in ethernet mode using older versions of firmware.. On my > > current setup 2.30.8000 <-> 2.30.8000 has been stable in ethernet mode > > for some time, but it probably couldn't hurt to use matching FW > > versions > > on both sides.. > > > > Also, it's been reported offlist that running large MTUs with certain > > (non Mellanox) switches can result in various timeouts + instability. > > It would be worthwhile to verify those settings on both sides as well. > > > > Mellanox folks..? Any other ideas to help debug this..? > > > > --nab > > > Looks like the issue was HardwareAcceleration in esx...Essentially we > would get the timeouts when trying to create a VM Thick Provisioned > Eager Zero which translated into esx sending a WRITE_SAME command. > Jared was doing a wireshark capture when he noticed that. > > This reminded me that we had to disable the HardwareAcceleration in > esx when we were doing VMMark last year. By default, these values are > enabled and LIO seems to advertise that it supports hardware > acceleration based on the datastore characteristics in ESX. > > > VMFS3.HardwareAcceleratedLocking > DataMover.HardwareAcceleratedMove > DataMover.HardwareAcceleratedInit > > As soon as we disabled them, no more time out issues... Ah yes, thanks for confirming. > I believe WRITE_SAME/XCOPY and ATS only made it into LIO in 3.14? So WRITE_SAME support for IBLOCK went in v3.6, along with generic EXTENDED_COPY + COMPARE_AND_WRITE support in v3.12. > The question I have is where does this information belong and how can > one debug these issues... > FYI, these VAAI primitives can also be disabled target side with device attributes: emulate_caw=0 emulate_3pc=0 max_write_same_len=0 To debug, please try ESX host settings Init=0 + Move=1 + Locking=1 to see if it's specific to WRITE_SAME, and separately if COMPARE_AND_WRITE traffic can also trigger the bug.. Also, what do the negotiated ImmediateData + InitialR2T parameter settings look like..? Thanks Moussa! --nab PS: Also grab the EXTENDED_COPY memory leak bugfix from Mikulas: https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?id=1e1110c43b1cda9fe77fc4a04835e460550e6b3c -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html