On Tue, May 28, 2013 at 5:38 PM, Jeremy Linton <jlinton@xxxxxxxxxxxxx> wrote: > This is another part of what formed my opinions about error isolation. If one > of your devices goes out to lunch and isn't recovering via abort/lun reset. > Its done! Wrecking the rest of the SAN doing "bus resets" and HBA resets is a > good way to take a serious problem and turn it into a full blown catastrophe. This is the gist of the issue, once you got to an abort you are screwed already. You need the abort but anything else should be reserved to when things are really dead (the HBA might still recover on a host reset, but only do it if the host is really unresponsive). That's why I prefer to have a long timeout for the command and a long timeout for the abort. The application above should handle itself with its own timeout once the abort was sent (the buffer remains locked until the abort returns). The device itself is likely stuck in error recovery and it will come out of it when its own internal timeouts are exhausted which can be infinite and will generally be very large. Baruch -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html