> Can you please add some debug printk's to scsi_schedule_eh() and see > whether scsi_eh_wakeup() is invoked from there? It seems likely that > the problem is caused by race conditions around > SHOST_[CANCEL_]RECOVERY flags. I did manage to reproduce the lockup again yesterday with a slightly different mix of tracing, including adding tracing to scsi_eh_wakeup() and scsi_schedule_eh(). It looks like the EH is being scheduled, but the EH thread goes immediately back to sleep and doesn't wake up: ata4: EH complete Waking error handler thread scsi_eh_wakeup: succeeded scsi_schedule_eh: succeeded scsi_restart_operations: waking up host to restart Error handler scsi_eh_3 sleeping Is it attempting to wake the scsi_eh_3 thread while scsi_error_handler is still processing an EH, which then calls scsi_restart_operations and puts the scsi_eh_3 thread back to sleep again? Some while after the lockup, there was some tracing relating to SCSI operations timing out, but the port was still unresponsive. The unit is not entirely stable in this state, and our application software was no longer able to strobe softdog, so the unit rebooted. Enough was running for the serial console to be responsive before the reboot, however. Thanks, Bruce. Latest News at: http://www.indigovision.com/index.php/en/news.html -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html