Hello, On 02/07/2010 02:30 AM, Mark Lord wrote: >>> * Soft reset the machine. Can BIOS recognize the drive? >> >> Yes, if I either 'echo b > /proc/sysrq-trigger', then the BIOS >> recognises the drive, and the box reboot normally. Hmmm... this means one of the followings. 1. The controller side is hung and needs some sort of reset or reinitialization to get working again. 2. The drive is hung requiring hardreset to continue. ata_piix currently can't do hardresets on ich7 but resetting the machine will definitely generate hardrsets. 3. The BIOS actually power-cycles the machine when told to reboot. Some BIOSen do this. No chance you can access the machine there? >>> Anyways, if it happens again, please try the above and try to find out >>> whether the controller or the drive is hung. Also, please keep in >>> mind that timeouts on 0xEA (flush) is very often indicative of power >>> >> >> OK, I didn't think I was seeing those - is it possible to tell from the >> detail which I posted in my original message? As for the potential for >> PSU shenanigans - I don't have access to the box to fiddle with that, >> unfortunately, but I believe I can stress the I/O subsystem quite >> heavily with dd and/or bonnie, but it's only when polling for SMART >> status that these errors show up. I've just started dd (to RAID mirror) >> + hdparm -I again to check... Oh... if that's the case, PSU problem wouldn't be very probable. >> Do the SMART error counters in the OP make this suspicious? Is there >> likely to be any different between running smartctl -a and hdparm -I in >> terms of code path taken though the kernel, or timings on the hardware, >> as far as you know? >From driver's POV, hdparm and smart commands behave pretty much the same. They travel through the same high/mid layer paths and gets issued using the same command protocol. From drive's POV, I imagine it can be pretty different tho. > My theory on the problem when I first had it here, was that doing > a FLUSH_CACHE[_EXT] before any PIO command (eg. SMART) should prevent > the problem. This was never explored further (by me or others). If that's the case, what would that mean? Would it be some nasty interaction inside the drive firmware? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html