Hello, while testing mpt fusion patches, I noticed the error handler in 2.6.28-rcX doesn't work any more. >From scsi_eh_scmd_add() the function scsi_eh_wakeup() is called, which will activate the error handler only if shost->host_busy == shost->host_failed. However, in the end for 90% of my testcases shost->host_failed= shost->host_busy+1 Due locking of shost->host_lock in scsi_eh_scmd_add(), which also locks shost->host_failed++, scsi_eh_wakeup() will still activate the error handler. But in scsi_error_handler() another check against shost->host_failed != shost->host_busy is done and mostly when it reaches this point shost->host_failed is already shost->host_busy+1, so scsi_error_handler() won't do anything at all. Since all commands have been queued for the error handler, access to this specific device is locked up for ever. I tried to bisect the problem and it points to this commit: 242f9dcb8ba6f68fcd217a119a7648a4f69290e9 is first bad commit commit 242f9dcb8ba6f68fcd217a119a7648a4f69290e9 Author: Jens Axboe <jens.axboe@xxxxxxxxxx> Date: Sun Sep 14 05:55:09 2008 -0700 block: unify request timeout handling I'm not absolutely sure, though, since the error handler only mostly fails. I verified each 'good' bisection two times, but from statistical point of view this is actually not sufficient. Also, On the one hand this commit doesn't seem to directly change the logic of host_failed or host_busy, but on the other hand, it is related to timeouts, which is what is actually activating the error handler for my test cases. Suggestions? -- Bernd Schubert Q-Leap Networks GmbH -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html