I wrote: > I've been trying to track down data corruption I'm seeing on my > server. Turns out it was a bad disk. Not a media error, but maybe bad RAM or logic on the drive. > I saw an error with AHCI that I hadn't seen before with the other > controllers. ... > Because the error at [11588.19xx] was repeated 30 times, I suspected > NCQ. I set the queue_depth on all 6 disks down to 1, and haven't seen > the same problem since It's not related to NCQ. I still saw the problem with it disabled, and it finally went away when I enabled spread-spectrum clocking in BIOS, even once I turned NCQ back on. So this report is bogus. Still, it seems that some improvements could be made to the EH when this sort of thing happens. For example, after "speed down requested but no transfer mode left" a few times in a row, maybe it would make sense to just fail the disk and give up. That would have allowed higher layers like MD to recover. -jim - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html