On 5/13/2013 3:29 PM, Martin K. Petersen wrote: > others. We see cases fairly often where a misbehaving target has > confused the HBA enough that we can not bring the device back without > doing an HBA firmware reset. Despite I/O completing successfully on > other targets connected to the same HBA. This would seem to indicate a HBA/driver bug... > So at some point we do need to give up and escalate to a full HBA > reset. We would just like to defer that hammer until we have run out of > other options. Except that I've seen the linux error recovery cause more problems than it solves on a fairly regular basis. I would rather have a solution designed to isolate failures, than one that makes a lot of mistakes and causes further problems (sometimes with other machines). I'm pretty convinced that attempting everything possible to recover a device when the underlying problem is unknown is a bad strategy. I think maybe its a perspective difference. If the device that is failing is an OS disk, then giving up is paramount to crashing the machine. On the other hand, if the failing device is some shared tape drive in a SAN with a few hundred alternatives then killing the OS in an attempt to recover that drive is a problem. Maybe, the super aggressive recovery paths should be reserved for devices marked critical to system operation. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html