I'm trying to understand what causes a drive performing a deep recovery cycle to get kicked out of a linux software raid array, and whether setting an ERC limit is the only option, or whether increasing the scsi command timeout is a reasonable alternative. As I understand it, if the deep recovery goes on for long enough, the scsi command timeout would be exceeded. This causes the SCSI error handler to attempt to abort the command and reset the device/bus/host. If these error handlers fail, the drive is set offline (which I assume is what kicks the drive out). ERC helps in this scenario as the drive will return an error before the timeout is exceeded. The scsi layer will return an error to the md/raid layer, which can take the appropriate action (retry operation / recover data from redundant source and rewrite it / kick disk or whatever). I have also read that the SCSI command timeout can be tuned via /sys/block/.../device/timeout, and defaults to 30 seconds. Would raising this timeout to a large value likewise prevent deep recovery cycles from causing the SCSI layer to set the drive offline? Does anyone know what is the maximum time taken for a deep recovery cycle? Or, might it be a situation where there will be lots of commands queued behind the access to the bad sector, and increasing the scsi command timeout would only help with the first command, and the rest of the queued commands will be exponentially delayed such that it is not feasible to avoid this by increasing the timeout value? Appreciate your comments and corrections if I've made mistaken assumptions above. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html