Re: [Scst-devel] OSS target - VMware SCSI reservation bug conformity.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2014-04-02 at 15:29 -0500, Dr. Greg Wettstein wrote:
> On Mar 28,  8:53pm, Vladislav Bolkhovitin wrote:
> } Subject: Re: [Scst-devel] OSS target - VMware SCSI reservation bug conform
> 
> > Dr. Greg Wettstein, on 03/27/2014 11:21 AM wrote:
> > > Hi, hope the week is going well for everyone.
> > >
> > > There appears to be evidence that VMware has an issue with exact SCSI
> > > standards compliance when it comes to handling corner cases with SCSI
> > > reservation requests.  It appears as if Dell is pushing firmware hot
> > > fixes for the EqualLogic controllers to work around the issue.
> 
> > Hi Greg,
> 
> Hi Vlad, thanks for taking the time to respond.
> 
> > That's interesting, but, unfortunately, your message doesn't contain
> > sufficient technical details to look at this issue, if it exists. Or
> > do you think we are magicians who can read minds and see through
> > walls? ;)
> 
> Actually I did, but I assumed a maintenance contract would be needed
> for that.
> 
> I had the following reasons for raising the issue:
> 
> 	1.) Does anyone in the open-source storage eco-system,
> 	    ie. SCST/LIO whatever, have any confirmation that this
> 	    is a known issue.
> 

FYI guys, AFAICT this bug is specific to targets that don't support VAAI
AtomicTestandSet (COMPARE_AND_WRITE), and need to use the legacy SCSI-2
reservations instead.

When AtomicTestandSet is available, ESX will avoid using reservations to
lock the whole LUN and obtain exclusive access to individual VMFS extent
on a per node basis instead.

--nab

> 	2.) To alert other open-source storage users/vendors that, at
> 	    least from our experience, it appears as if the problem
> 	    may be rare but legitimate.
> 
> 	3.) To determine, if the bug could be found, whether things
> 	    like mode pages would make sense in an open-source stack
> 	    to address issues such as this.
> 
> I've had a fair amount of private feedback that there is a good chance
> the issue may be legitimate.  I've also had feedback that there may be
> other issues with VMware 'corner-case' behavior.  Given the nature of
> the VAAI extensions/primitives rolling out I would anticipate that to
> be a fertile area for these types of issues as well.
> 
> I'm not even sure, given the nature of the issue, if it could be
> tracked down but everyone who is interested in the issue can now be
> looking for it.
> 
> Here is the essence of what we have to work with, redacted due to the
> volume of messages, to sentinel events.
> 
> SDS proxy: ----------------------------------------------------------------
> Mar 19 21:50:30 PROXY kernel: rport-3:0-0: blocked FC remote port time out: removing target and saving binding
> Mar 19 21:50:30 PROXY kernel: sd 3:0:0:0: rejecting I/O to offline device
> Mar 19 21:50:32 PROXY kernel: qla2xxx [0000:04:00.1]-8009:3: DEVICE RESET ISSUED nexus=3:0:0 cmd=da751240.
> ..
> .. Noise from Qlogic adapter doing DTB reset.
> ..
> Mar 19 21:50:40 PROXY kernel: qla2xxx [0000:04:00.1]-8018:3: ADAPTER RESET ISSUED nexus=3:0:0.
> ..
> .. More noise from the Qlogic adapater.
> ..
> Mar 19 21:55:43 PROXY kernel: qla2xxx [0000:04:00.1]-8017:3: ADAPTER RESET SUCCEEDED nexus=3:0:0.
> ---------------------------------------------------------------------------
> 
> VMware logs: --------------------------------------------------------------
> Mar 19 21:49:08 VMWARE1 2014-03-20T02:49:08.592Z VMWARE1 vmkernel: cpu2:8432)<6>qla2xxx 0000:42:00.0: scsi(8:0:1): Abort command succeeded -- 1 1247096017.
> Mar 19 21:49:12 VMWARE1 2014-03-20T02:49:12.664Z VMWARE1 vobd:  [vmfsCorrelator] 11527431227280us: [esx.problem.vmfs.heartbeat.timedout] 52f3b635-9ea13918-af49-bc305bee68bc VOLUMENAME
> Mar 19 21:49:14 VMWARE1 2014-03-20T02:49:12.670Z VMWARE1 Hostd: [24FDEB90 info 'Vimsvc.ha-eventmgr'] Event 466 : Lost access to volume 52f3b635-9ea13918-af49-bc305bee68bc (VOLUMENAME) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
> ..
> .. Noise from VMWARE hosts about wanting path updates - only single path
> .. to proxy, heartbeat timeouts etc.
> ..
> Mar 19 21:49:32 VMWARE1 2014-03-20T02:49:32.911Z VMWARE1 vmkernel: cpu10:8202)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba2:C0:T0:L1, reservation state on device eui.6665356665393330 is unknown.
> ..
> .. More noise from VMWARE hosts, additional reservation failures etc.
> ..
> Mar 19 21:50:32 VMWARE2 2014-03-20T02:50:30.554Z VMWARE2 vmkwarning: cpu22:8237)WARNING: HBX: 564: Volume 52f3b635-9ea13918-af49-bc305bee68bc ("VOLUMENAME") may be damaged on disk. Corrupt heartbeat detected at offset 3653632: [HB state 0 offset 0 gen 0 stampUS 0 uuid 00000000-00000000-0000
> ---------------------------------------------------------------------------
> 
> And at that point things were pretty much over with.
> 
> I would certainly be open to suggestions on how to track or obtain
> useful information for you.  The SDS proxy was sustaining about 350
> megabytes/second of I/O from seven initiators so I don't think turning
> on target mode debugging and cmd tracing is much of an option.
> 
> > Thanks,
> > Vlad
> 
> Have a good weekend.
> 
> Greg
> 
> }-- End of excerpt from Vladislav Bolkhovitin
> 
> As always,
> Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
> 4206 N. 19th Ave.           Specializing in information infra-structure
> Fargo, ND  58102            development.
> PH: 701-281-1686
> FAX: 701-281-3949           EMAIL: greg@xxxxxxxxxxxx
> ------------------------------------------------------------------------------
> "After being a technician for 2 years, I've discovered if people took
>  care of their health with the same reckless abandon as their computers,
>  half would be at the kitchen table on the phone with the hospital, trying
>  to remove their appendix with a butter knife."
>                                 -- Brian Jones
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Scst-devel mailing list
> https://lists.sourceforge.net/lists/listinfo/scst-devel


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux