On 12/7/2012 3:20 PM, Mike Christie wrote: > On 12/07/2012 03:05 PM, Jeremy Linton wrote: >> That said, its far from perfect. The code (as I understand it) isn't >> differentiating between isolating the failure, or bringing out the big >> hammer in an attempt to correct problems on a specific I_T_L. If you >> drop/reset the I_T because one of the LUN's is misbehaving before >> verifying the status of other LUN's on the target, you risk interrupting >> operations to functional devices. > > When this code is called the scsi eh has run the abort handler for each > outstanding command and that has failed, and it has run the lun/device > reset handler and that has failed (or the eh operations succeeded but the > TUR checkup the scsi eh does failed). I think my issue with the error handler (rather than this patch in particular) surrounds the fact that when scsi_eh_bus_device_reset (which maps to lun reset) fails, it falls to scsi_eh_target_reset which issues a TARGET RESET which then broadens the problem to devices which may be working fine, and just happen to be on the same I_T. I think there should be some attempt to determine if there are other devices on the I_T, and whether they have failed before going into target_reset. It looks like there may have been a plan to do that in bus_device_reset, but it doesn't appear to be complete. Now, all that said, I have a few things I wonder about in the eh_bus_device_reset code. For one the use of TUR rather than a command with a more straightforward return status like INQUIRY which also preserves the check conditions. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html