Tejun Heo wrote: >> The only constants seem to be libata and ICH7/8. >> We must have a bug somewhere in there. >> > > In piix mode or ahci mode? If in piix mode, ich7 and 8 would behave > quite differently. ICH8 has SIDPR so it can hardreset while 7 can't. > ICH SIDPR access had a hardware problem where write to SControl to > clear DET is sometimes ignored which led to occassional hardreset > failure which got fixed recently. The reason why ich's are involved > in those incidents could just be that they are extremely popular. > It's a non-AHCI capable ICH7, so it's in piix mode. > Things to try after such completely drive shutdown are... > Unfortunately I can't do much with this box, as it's a rented box in a datacentre, however.... > * Soft reset the machine. Can BIOS recognize the drive? > Yes, if I either 'echo b > /proc/sysrq-trigger', then the BIOS recognises the drive, and the box reboot normally. > In many cases I've seen, it's usually that the drive's firmware is > completely hung and only power cycling the drive brought it back. But > then again, there have been some number of cases which didn't get > diagnosed properly, so it's definitely possible that we're doing > something wrong in the driver. > > Anyways, if it happens again, please try the above and try to find out > whether the controller or the drive is hung. Also, please keep in > mind that timeouts on 0xEA (flush) is very often indicative of power > OK, I didn't think I was seeing those - is it possible to tell from the detail which I posted in my original message? As for the potential for PSU shenanigans - I don't have access to the box to fiddle with that, unfortunately, but I believe I can stress the I/O subsystem quite heavily with dd and/or bonnie, but it's only when polling for SMART status that these errors show up. I've just started dd (to RAID mirror) + hdparm -I again to check... Do the SMART error counters in the OP make this suspicious? Is there likely to be any different between running smartctl -a and hdparm -I in terms of code path taken though the kernel, or timings on the hardware, as far as you know? Cheers, Tim. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html