The subject of this thread has been adjusted to reflect my revised understanding of the problem. I had previously thought the problem with a RAID1 re-sync generating timeout errors on the active (reading) disk appeared with 2.6.19 but it turns out the problem was also present in 2.6.18. I was running the 2.6.18 tests with a modular kernel and hadn't loaded the ahci module in that scenario (so there was no queueing) but was using a kernel with SATA built in for the 2.6.19 test (which enabled queueing). When ahci is loaded with 2.6.18, the same timeout problem appears. The distinguishing factor appears to be the queue depth (4 works, 5 and various values up to and including 31 fail) not the kernel version. I am going to try running with the queue depth clamped at 4 to see if this consistently masks the problem. I may also try some more experiments if I have the time, like instrumenting what command was issued right before the group that all time out or increasing the SCSI timeout, in order to get some more insight into what is going on at the time of the failure. I'd be happy to run any tests people with expertise in this area can suggest to help better understand what is going on. I guess this problem looks like the disk becomes wedged somehow at this point. For what its worth, it is usually possible to regain control of the system if I stop the RAID re-sync after the first timeout is reported but when I let it try to run to completion, the SATA sub-system system usually eventually gets stuck issuing reset after reset with no success and a power-cycle is the only recourse. -- Mike Accetta ECI Telecom Ltd. Data Networking Division (previously Laurel Networks) - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html