Re: [Scst-devel] OSS target - VMware SCSI reservation bug conformity.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mar 28,  8:53pm, Vladislav Bolkhovitin wrote:
} Subject: Re: [Scst-devel] OSS target - VMware SCSI reservation bug conform

> Dr. Greg Wettstein, on 03/27/2014 11:21 AM wrote:
> > Hi, hope the week is going well for everyone.
> >
> > There appears to be evidence that VMware has an issue with exact SCSI
> > standards compliance when it comes to handling corner cases with SCSI
> > reservation requests.  It appears as if Dell is pushing firmware hot
> > fixes for the EqualLogic controllers to work around the issue.

> Hi Greg,

Hi Vlad, thanks for taking the time to respond.

> That's interesting, but, unfortunately, your message doesn't contain
> sufficient technical details to look at this issue, if it exists. Or
> do you think we are magicians who can read minds and see through
> walls? ;)

Actually I did, but I assumed a maintenance contract would be needed
for that.

I had the following reasons for raising the issue:

	1.) Does anyone in the open-source storage eco-system,
	    ie. SCST/LIO whatever, have any confirmation that this
	    is a known issue.

	2.) To alert other open-source storage users/vendors that, at
	    least from our experience, it appears as if the problem
	    may be rare but legitimate.

	3.) To determine, if the bug could be found, whether things
	    like mode pages would make sense in an open-source stack
	    to address issues such as this.

I've had a fair amount of private feedback that there is a good chance
the issue may be legitimate.  I've also had feedback that there may be
other issues with VMware 'corner-case' behavior.  Given the nature of
the VAAI extensions/primitives rolling out I would anticipate that to
be a fertile area for these types of issues as well.

I'm not even sure, given the nature of the issue, if it could be
tracked down but everyone who is interested in the issue can now be
looking for it.

Here is the essence of what we have to work with, redacted due to the
volume of messages, to sentinel events.

SDS proxy: ----------------------------------------------------------------
Mar 19 21:50:30 PROXY kernel: rport-3:0-0: blocked FC remote port time out: removing target and saving binding
Mar 19 21:50:30 PROXY kernel: sd 3:0:0:0: rejecting I/O to offline device
Mar 19 21:50:32 PROXY kernel: qla2xxx [0000:04:00.1]-8009:3: DEVICE RESET ISSUED nexus=3:0:0 cmd=da751240.
..
.. Noise from Qlogic adapter doing DTB reset.
..
Mar 19 21:50:40 PROXY kernel: qla2xxx [0000:04:00.1]-8018:3: ADAPTER RESET ISSUED nexus=3:0:0.
..
.. More noise from the Qlogic adapater.
..
Mar 19 21:55:43 PROXY kernel: qla2xxx [0000:04:00.1]-8017:3: ADAPTER RESET SUCCEEDED nexus=3:0:0.
---------------------------------------------------------------------------

VMware logs: --------------------------------------------------------------
Mar 19 21:49:08 VMWARE1 2014-03-20T02:49:08.592Z VMWARE1 vmkernel: cpu2:8432)<6>qla2xxx 0000:42:00.0: scsi(8:0:1): Abort command succeeded -- 1 1247096017.
Mar 19 21:49:12 VMWARE1 2014-03-20T02:49:12.664Z VMWARE1 vobd:  [vmfsCorrelator] 11527431227280us: [esx.problem.vmfs.heartbeat.timedout] 52f3b635-9ea13918-af49-bc305bee68bc VOLUMENAME
Mar 19 21:49:14 VMWARE1 2014-03-20T02:49:12.670Z VMWARE1 Hostd: [24FDEB90 info 'Vimsvc.ha-eventmgr'] Event 466 : Lost access to volume 52f3b635-9ea13918-af49-bc305bee68bc (VOLUMENAME) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
..
.. Noise from VMWARE hosts about wanting path updates - only single path
.. to proxy, heartbeat timeouts etc.
..
Mar 19 21:49:32 VMWARE1 2014-03-20T02:49:32.911Z VMWARE1 vmkernel: cpu10:8202)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba2:C0:T0:L1, reservation state on device eui.6665356665393330 is unknown.
..
.. More noise from VMWARE hosts, additional reservation failures etc.
..
Mar 19 21:50:32 VMWARE2 2014-03-20T02:50:30.554Z VMWARE2 vmkwarning: cpu22:8237)WARNING: HBX: 564: Volume 52f3b635-9ea13918-af49-bc305bee68bc ("VOLUMENAME") may be damaged on disk. Corrupt heartbeat detected at offset 3653632: [HB state 0 offset 0 gen 0 stampUS 0 uuid 00000000-00000000-0000
---------------------------------------------------------------------------

And at that point things were pretty much over with.

I would certainly be open to suggestions on how to track or obtain
useful information for you.  The SDS proxy was sustaining about 350
megabytes/second of I/O from seven initiators so I don't think turning
on target mode debugging and cmd tracing is much of an option.

> Thanks,
> Vlad

Have a good weekend.

Greg

}-- End of excerpt from Vladislav Bolkhovitin

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@xxxxxxxxxxxx
------------------------------------------------------------------------------
"After being a technician for 2 years, I've discovered if people took
 care of their health with the same reckless abandon as their computers,
 half would be at the kitchen table on the phone with the hospital, trying
 to remove their appendix with a butter knife."
                                -- Brian Jones
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux