On Mar 28, 8:53pm, Vladislav Bolkhovitin wrote: } Subject: Re: [Scst-devel] OSS target - VMware SCSI reservation bug conform > Dr. Greg Wettstein, on 03/27/2014 11:21 AM wrote: > > Hi, hope the week is going well for everyone. > > > > There appears to be evidence that VMware has an issue with exact SCSI > > standards compliance when it comes to handling corner cases with SCSI > > reservation requests. It appears as if Dell is pushing firmware hot > > fixes for the EqualLogic controllers to work around the issue. > Hi Greg, Hi Vlad, thanks for taking the time to respond. > That's interesting, but, unfortunately, your message doesn't contain > sufficient technical details to look at this issue, if it exists. Or > do you think we are magicians who can read minds and see through > walls? ;) Actually I did, but I assumed a maintenance contract would be needed for that. I had the following reasons for raising the issue: 1.) Does anyone in the open-source storage eco-system, ie. SCST/LIO whatever, have any confirmation that this is a known issue. 2.) To alert other open-source storage users/vendors that, at least from our experience, it appears as if the problem may be rare but legitimate. 3.) To determine, if the bug could be found, whether things like mode pages would make sense in an open-source stack to address issues such as this. I've had a fair amount of private feedback that there is a good chance the issue may be legitimate. I've also had feedback that there may be other issues with VMware 'corner-case' behavior. Given the nature of the VAAI extensions/primitives rolling out I would anticipate that to be a fertile area for these types of issues as well. I'm not even sure, given the nature of the issue, if it could be tracked down but everyone who is interested in the issue can now be looking for it. Here is the essence of what we have to work with, redacted due to the volume of messages, to sentinel events. SDS proxy: ---------------------------------------------------------------- Mar 19 21:50:30 PROXY kernel: rport-3:0-0: blocked FC remote port time out: removing target and saving binding Mar 19 21:50:30 PROXY kernel: sd 3:0:0:0: rejecting I/O to offline device Mar 19 21:50:32 PROXY kernel: qla2xxx [0000:04:00.1]-8009:3: DEVICE RESET ISSUED nexus=3:0:0 cmd=da751240. .. .. Noise from Qlogic adapter doing DTB reset. .. Mar 19 21:50:40 PROXY kernel: qla2xxx [0000:04:00.1]-8018:3: ADAPTER RESET ISSUED nexus=3:0:0. .. .. More noise from the Qlogic adapater. .. Mar 19 21:55:43 PROXY kernel: qla2xxx [0000:04:00.1]-8017:3: ADAPTER RESET SUCCEEDED nexus=3:0:0. --------------------------------------------------------------------------- VMware logs: -------------------------------------------------------------- Mar 19 21:49:08 VMWARE1 2014-03-20T02:49:08.592Z VMWARE1 vmkernel: cpu2:8432)<6>qla2xxx 0000:42:00.0: scsi(8:0:1): Abort command succeeded -- 1 1247096017. Mar 19 21:49:12 VMWARE1 2014-03-20T02:49:12.664Z VMWARE1 vobd: [vmfsCorrelator] 11527431227280us: [esx.problem.vmfs.heartbeat.timedout] 52f3b635-9ea13918-af49-bc305bee68bc VOLUMENAME Mar 19 21:49:14 VMWARE1 2014-03-20T02:49:12.670Z VMWARE1 Hostd: [24FDEB90 info 'Vimsvc.ha-eventmgr'] Event 466 : Lost access to volume 52f3b635-9ea13918-af49-bc305bee68bc (VOLUMENAME) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly. .. .. Noise from VMWARE hosts about wanting path updates - only single path .. to proxy, heartbeat timeouts etc. .. Mar 19 21:49:32 VMWARE1 2014-03-20T02:49:32.911Z VMWARE1 vmkernel: cpu10:8202)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path vmhba2:C0:T0:L1, reservation state on device eui.6665356665393330 is unknown. .. .. More noise from VMWARE hosts, additional reservation failures etc. .. Mar 19 21:50:32 VMWARE2 2014-03-20T02:50:30.554Z VMWARE2 vmkwarning: cpu22:8237)WARNING: HBX: 564: Volume 52f3b635-9ea13918-af49-bc305bee68bc ("VOLUMENAME") may be damaged on disk. Corrupt heartbeat detected at offset 3653632: [HB state 0 offset 0 gen 0 stampUS 0 uuid 00000000-00000000-0000 --------------------------------------------------------------------------- And at that point things were pretty much over with. I would certainly be open to suggestions on how to track or obtain useful information for you. The SDS proxy was sustaining about 350 megabytes/second of I/O from seven initiators so I don't think turning on target mode debugging and cmd tracing is much of an option. > Thanks, > Vlad Have a good weekend. Greg }-- End of excerpt from Vladislav Bolkhovitin As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@xxxxxxxxxxxx ------------------------------------------------------------------------------ "After being a technician for 2 years, I've discovered if people took care of their health with the same reckless abandon as their computers, half would be at the kitchen table on the phone with the hospital, trying to remove their appendix with a butter knife." -- Brian Jones -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html