Hi Tejun, Sorry for sending so many emails yesterday; I blame the dental anaesthetic I received in the morning for being so jumpy on the send button ;-) > Can you please instrument the followings? > > * print the caller of scsi_eh_wakeup(). "%pF" w/ (void *)_RET_IP_ > should do it. > > * print why scsi_eh is going back to sleep immediately. > ie. shost->host_failed, host_eh_scheduled, host_busy in > scsi_error_handler(). It would also be nice to add some printks > around schedule() and enable PRINTK_TIME. I can certainly try this. Could you confirm whether my thoughts about a race between the scsi_eh thread and the wake-up are plausible? I backtracked yesterday because I thought the scsi_eh thread would get rescheduled naturally, not realising that when the task state is TASK_INTERRUPTIBLE schedule() takes the task off the run queue (so it needs to be explicitly woken.) Here is my thinking again: shost->host_eh_scheduled is read here in scsi_error_handler: set_current_state(TASK_INTERRUPTIBLE); while (!kthread_should_stop()) { if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) || There's no locking in scsi_error_handler (though functions it calls probably claim locks.) When scheduling an EH, scsi_schedule_eh takes the shost->host_lock, increments shost->host_eh_scheduled, and then wakes the EH thread. If this happens between the scsi_eh thread reading host_eh_scheduled and sending itself back to sleep (when the scsi_eh thread's state is TASK_INTERRUPTIBLE) nothing will wake up the thread again and host_eh_scheduled will not get inspected. host_eh_scheduled is stuck at 1 with the scsi_eh thread asleep, and it won't get woken again because the ata port has been frozen and irqs are masked off. Bruce. Latest News at: http://www.indigovision.com/index.php/en/news.html -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html