(Btw, please don't top post on kernel mailing lists folks, it's annoying) On Wed, 2015-08-19 at 12:16 -0400, Alex Gorbachev wrote: > I have to say that changing default_cmdsn_depth did not help us with > the abnormal timeouts, i.e. OSD failing or some other abrupt event. > When that happens we detect the event via ABORT_TASK and if the event > is transient usually nothing happens. Anything more than a few > seconds will usually result in Ceph recovery but ESXi gets stuck and > never comes out of APD. Looks like it tries to establish another > session by bombarding the target with retries and resets, and > ultimately gives up and goes to PDL state. Then the only option is > reboot. > > So to be clear, we have moved on from a discussion about slow storage > to a discussion about what happens during unexpected and abnormal > timeouts. Anecdotal evidence suggests that SCST based systems will > allow ESXi recover from this condition, while ESXi does not play as > well with LIO based systems in those situations. > > What is the difference, and is there willingness to allow LIO to be > modified to work with this ESXi behavior? Or should we ask Vmware to > do something for ESXi to play better with LIO? I cannot fix the code, > but would be happy to be the voice of the issue via any available > channels. > Based on these and earlier comments, I think there is still some misconception about misbehaving backend devices, and what needs to happen in order for LIO to make forward progress during iscsi session reinstatement. Allowing a new session login to proceed and submit new WRITEs when the failed session can't get I/O completion with exception status to happen from a backend driver is bad. Because, unless previous I/Os are able to be (eventually) completed or aborted within target-core before new backend driver I/O submission happens, there is no guarantee the stale WRITEs won't be completed after subsequent new WRITEs from a different session with a new command sequence number. Which means there is potential for new writes to be lost, and is the reason why 'violating the spec' in this context is not allowed. If a backend driver is not able to complete I/O before ESX timeout triggers to give-up on outstanding I/Os is being reached, then the backend driver needs to: * Have a lower internal I/O timeout to complete back to target-core with exception status before ESX gives up on iscsi session login attempts, and associated session I/O. Also, SCSI LLDs and raw block drivers work very differently wrt to I/O timeout and reset. For underlying SCSI LLDs, scsi_eh will attempt to reset the device to complete failed I/O. Setting the scsi_eh timeout lower than ESX's iscsi login timeout to give up and fails I/O is one simple option to consider. However, if your LLD or LLD's firmware doesn't *ever* complete I/O back to scsi-core even after a reset occurs resulting in LIO blocking indefinitely on session reinstatement, then it's a LLD specific bug and really should be fixed. --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html