RE: 3.12.5 Target Errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> snip

> > > Just FYI, I've previously encountered some stability issues with
> > > ConnectX-3's in ethernet mode using older versions of firmware..
> On my
> > > current setup 2.30.8000 <-> 2.30.8000 has been stable in ethernet
> mode
> > > for some time, but it probably couldn't hurt to use matching FW
> > > versions
> > > on both sides..
> > >
> > > Also, it's been reported offlist that running large MTUs with
> certain
> > > (non Mellanox) switches can result in various timeouts +
> instability.
> > > It would be worthwhile to verify those settings on both sides as
> well.
> > >
> > > Mellanox folks..?  Any other ideas to help debug this..?
> > >
> > > --nab
> >
> >
> > Looks like the issue was HardwareAcceleration in esx...Essentially we
> > would get the timeouts when trying to create a VM Thick Provisioned
> > Eager Zero which translated into esx sending a WRITE_SAME command.
> > Jared was doing a wireshark capture when he noticed that.
> >
> > This reminded me that we had to disable the HardwareAcceleration in
> > esx when we were doing VMMark last year.  By default, these values
> are
> > enabled and LIO seems to advertise that it supports hardware
> > acceleration based on the datastore characteristics in ESX.
> >
> >
> > VMFS3.HardwareAcceleratedLocking
> > DataMover.HardwareAcceleratedMove
> > DataMover.HardwareAcceleratedInit
> >
> > As soon as we disabled them, no more time out issues...
> 
> Ah yes, thanks for confirming.
> 
> > I believe WRITE_SAME/XCOPY and ATS only made it into LIO in 3.14?
> 
> So WRITE_SAME support for IBLOCK went in v3.6, along with generic
> EXTENDED_COPY + COMPARE_AND_WRITE support in v3.12.
> 
> > The question I have is where does this information belong and how can
> > one debug these issues...
> >
> 
> FYI, these VAAI primitives can also be disabled target side with device
> attributes:
> 
>   emulate_caw=0
>   emulate_3pc=0
>   max_write_same_len=0
> 
> To debug, please try ESX host settings Init=0 + Move=1 + Locking=1 to
> see if it's specific to WRITE_SAME, and separately if COMPARE_AND_WRITE
> traffic can also trigger the bug..

Setting Init=0, Move=1 and Locking=1 does not create the time out issue. So so far it seems the issues is specific to WRITE_SAME.  I will 

> 
> Also, what do the negotiated ImmediateData + InitialR2T parameter
> settings look like..?

Both are set to yes. I am reading these off of /sys/kernel/config/.../iqn..../param/



> 
> Thanks Moussa!
> 
> --nab
> 
> PS: Also grab the EXTENDED_COPY memory leak bugfix from Mikulas:
> 
> https://git.kernel.org/cgit/linux/kernel/git/nab/target-
> pending.git/commit/?id=1e1110c43b1cda9fe77fc4a04835e460550e6b3c

��.n��������+%������w��{.n����j�����{ay�ʇڙ���f���h������_�(�階�ݢj"��������G����?���&��





[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux