On Fri, 4 Jul 2008 19:50:17 +0100, Aneurin Price wrote: >>>[1382260.429883] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action >>>0x2 frozen >>>[1382260.429931] ata1.00: cmd 25/00:50:27:6e:cd/00:00:15:00:00/e0 tag >>>0 dma 40960 in >>>[1382260.429933] res 40/00:00:00:00:00/00:00:00:00:00/00 >>>Emask 0x4 (timeout) >>>[1382260.429956] ata1.00: status: { DRDY } >>>[1382265.796276] ata1: port is slow to respond, please be patient (Status >>>0xff) >>>[1382270.473163] ata1: device not ready (errno=-16), forcing hardreset >>>[1382270.473179] ata1: hard resetting link >>>[1382276.679024] ata1: port is slow to respond, please be patient (Status >>>0xff) >>>[1382280.476592] ata1: COMRESET failed (errno=-16) >>>[1382280.476626] ata1: hard resetting link >>>[1382286.692400] ata1: port is slow to respond, please be patient (Status >>>0xff) >>>[1382290.529795] ata1: COMRESET failed (errno=-16) >>>[1382290.529829] ata1: hard resetting link >>>[1382296.745702] ata1: port is slow to respond, please be patient (Status >>>0xff) >>>[1382325.566448] ata1: COMRESET failed (errno=-16) >>>[1382325.566484] ata1: limiting SATA link speed to 1.5 Gbps >>>[1382325.566487] ata1: hard resetting link >>>[1382330.573112] ata1: COMRESET failed (errno=-16) >>>[1382330.573146] ata1: reset failed, giving up >>>[1382330.573162] ata1.00: disabled >>>[1382330.573188] ata1: exception Emask 0x10 SAct 0x0 SErr 0x190002 >>>action 0xa frozen t4 >>>[1382330.573212] ata1: hotplug_status 0x10 >>>[1382330.573226] ata1: SError: { RecovComm PHYRdyChg 10B8B Dispar } >> ... >>>[1382571.052939] ata1: EH pending after 5 tries, giving up >> >> These are signs of the disk going offline, or the communication between >> the controller and the disk being corrupted. That's a hardware issue, >> not unlike what we see with bad PSUs. >> >> The 2.6.24 kernel lacks two post-2.6.24 sata_promise bug fixes. >> The first fixes a problem where error recovery may trigger unexpected >> hotplug events (we see those in your log), the second fixes a potential >> problem in interrupt status clearing operations. >> > >Does this mean that it could potentially be possible to recover from this error, >even without nailing the cause? In your log the stray hotplug events occur only after several failed COMRESET attempts. I don't know if fixing the stray hotplug events has any effect on the COMRESETs. Try the patch, it won't do any harm. > Are random hardware problems of this sort quite >common, and papered over by good drivers as a matter of course? I wouldn't say "common". It seems to vary a lot from machine to machine. As for papering over, that's what the error recovery handling in libata and the driver are supposed to handle, although it's clearly not always effective. /Mikael -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html