On Wed, Sep 07, 2011 at 01:33:55AM +0900, Tejun Heo wrote: > Hello, > > On Tue, Sep 06, 2011 at 01:19:44PM +0100, Bruce Stenning wrote: > > ata4: EH complete > > Waking error handler thread > > scsi_eh_wakeup: succeeded > > scsi_schedule_eh: succeeded > > scsi_restart_operations: waking up host to restart > > Error handler scsi_eh_3 sleeping > > I think the following should fix the problem. The code there is from > the time when libata shared scsi_host->host_lock. libata no longer > does that so, in the current state, host_eh_scheduled can be cleared > with actual pending EH condition. Hmmm... maybe not. Such race condition exists iff host_eh_scheduled is incremented outside of ap->lock, which I can't see how. Weird. Can you please instrument the followings? * print the caller of scsi_eh_wakeup(). "%pF" w/ (void *)_RET_IP_ should do it. * print why scsi_eh is going back to sleep immediately. ie. shost->host_failed, host_eh_scheduled, host_busy in scsi_error_handler(). It would also be nice to add some printks around schedule() and enable PRINTK_TIME. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html