On 02/17/2010 04:02 AM, Bjørn Mork wrote:
I'm trying to debug a problem I've been having a couple of times over the last few weeks, where this machine hangs *really* hard: No console output at all, and absolutely no response on the console - not even using the magic SysRq key (using "break" on the serial console. This is tested and known to be fully functional under normal conditions). I really have no clue where to start to locate the cause of this, but after rebooting the last time there were a few libata error messages which puzzle me so I might as well start here. Understandig these errors will be useful in itself. And they are related to the last pieces of hardware added (SiI 3132 controller attached to a SiI 4726 port multiplier with 3 disks), which make them more suspicious in my eyes... These are the messages I worry about (full dmesg is included below): [ 63.865723] ata8.00: log page 10h reported inactive tag 0 [ 63.943744] ata8.00: exception Emask 0x1 SAct 0x3c SErr 0x0 action 0x0 [ 64.107789] ata8.00: irq_stat 0x03060002, device error via SDB FIS [ 64.199764] ata8.00: cmd 60/08:10:00:02:00/00:00:00:00:00/40 tag 2 ncq 4096 in [ 64.199765] res 60/08:10:00:02:00/00:00:00:00:00/40 Emask 0x1 (device error) [ 64.580730] ata8.00: status: { DRDY DF } [ 64.628731] ata8.00: cmd 60/78:18:08:02:00/00:00:00:00:00/40 tag 3 ncq 61440 in [ 64.628732] res 60/78:18:08:02:00/00:00:00:00:00/40 Emask 0x89 (media error) [ 64.810084] ata8.00: status: { DRDY DF } [ 64.879835] ata8.00: error: { UNC IDNF } [ 64.930935] ata8.00: cmd 60/18:20:e8:00:00/00:00:00:00:00/40 tag 4 ncq 12288 in [ 64.930936] res 56/1b:02:02:00:00/00:00:00:40:56/00 Emask 0x1 (device error) [ 65.204495] ata8.00: status: { DRDY } [ 65.248497] ata8.00: error: { IDNF } [ 65.292407] ata8.00: cmd 60/80:28:80:01:00/00:00:00:00:00/40 tag 5 ncq 65536 in [ 65.292408] res 56/1b:02:02:00:00/00:00:00:50:56/00 Emask 0x1 (device error) [ 65.590268] ata8.00: status: { DRDY } [ 65.658016] ata8.00: error: { IDNF } [ 65.725346] ata8.00: configured for UDMA/100 [ 65.780607] sd 7:0:0:0: [sdd] Device not ready: Sense Key : Not Ready [current] [descriptor] [ 65.925384] sd 7:0:0:0: [sdd] Device not ready: Add. Sense: Logical unit not ready, cause not reportable [ 66.040355] end_request: I/O error, dev sdd, sector 520 [ 66.103442] ata8: EH complete [ 66.137877] sd 7:0:0:0: [sdd] 3907029168 512-byte hardware sectors (2000399 MB) [ 66.228486] sd 7:0:0:0: [sdd] Write Protect is off [ 66.285654] sd 7:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [ 66.285676] sd 7:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA These errors appeared after power cycling the hanging machine, and may therefore just as well be symptoms as a cause. As you see, the error handling is successful and all drives are working as expected. So I guess this might just be a harmless warning caused by an unrelated hang and the unexpected power cycling. Anyway, here are the details of this controller in case they are of interest:
Well, that error indicates a read error on some sectors reported by the drive. This could be caused by a hard power-down in the middle of a write to those sectors - in that case, one can in principle use something like hdparm --write-sector to rewrite the sector correctly. However, it could also be due to a drive fault.
-- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html