On 02/27/2011 11:10 AM, MichaÅ Piotrowski wrote: > Hi, > > This can be a hardware problem - hard to say. For some reason on one > of the disks smart test is interrupted > > # 1 Extended offline Interrupted (host reset) 90% 12489 - > # 2 Extended offline Interrupted (host reset) 90% 12484 - > > > I see this in dmesg > > [ 4328.800100] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen > [ 4328.800129] ata3.00: failed command: WRITE DMA EXT > [ 4328.800153] ata3.00: cmd 35/00:08:7e:dc:9f/00:00:2c:00:00/e0 tag 0 > dma 4096 out > [ 4328.800157] res 40/00:00:02:4f:c2/00:00:00:00:00/00 Emask > 0x4 (timeout) > [ 4328.800190] ata3.00: status: { DRDY } > [ 4333.849048] ata3: link is slow to respond, please be patient (ready=0) > [ 4338.847048] ata3: device not ready (errno=-16), forcing hardreset > [ 4338.847063] ata3: soft resetting link > [ 4339.837375] ata3.00: configured for UDMA/133 > [ 4339.837407] ata3: EH complete > > I'm using 2.6.37.2 with config based on an old rawhide 2.6.37. I have > not noticed other problems with this disc. What might be causing this > interrupts? I've been having similar problems lately. First my laptop, and I assumed a hardware problem, so I replaced the HDD. Then the server started doing it, which seemed quite a coincidence, but because its uptime was around two months at the time and it was still running a 2.6.35.9 kernel while my laptop problems started with 2.6.35.11, I thought it was just coincidence. Now, if you bring this up, I'm not so sure. Here's what I saw happening on the laptop: [ 1199.706084] ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen [ 1199.706094] ata1.00: failed command: WRITE FPDMA QUEUED [ 1199.706101] ata1.00: cmd 61/08:00:67:48:3f/00:00:16:00:00/40 tag 0 ncq 4096 out [ 1199.706106] ata1.00: status: { DRDY } (repeat the above 3 lines many times) [ 1199.706533] ata1: hard resetting link [ 1209.754149] ata1: softreset failed (device not ready) [ 1209.754155] ata1: hard resetting link [ 1219.802039] ata1: softreset failed (device not ready) [ 1219.802046] ata1: hard resetting link [ 1230.360039] ata1: link is slow to respond, please be patient (ready=0) [ 1239.438047] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 1239.444280] ata1.00: configured for UDMA/133 [ 1239.444286] ata1.00: device reported invalid CHS sector 0 (repeat above 1 line many times) [ 1239.444431] ata1: EH complete [ 1318.752164] ata1.00: exception Emask 0x0 SAct 0x70040b0 SErr 0x0 action 0x6 frozen [ 1318.752171] ata1.00: failed command: WRITE FPDMA QUEUED [ 1318.752178] ata1.00: cmd 61/48:20:e7:1b:ac/00:00:22:00:00/40 tag 4 ncq 36864 out [ 1318.752183] ata1.00: status: { DRDY } (repeat above 3 lines many many times, lather, rinse, repeat) In the meantime, the system almost completely freezes up, and the disk activity light stays on. On the server: [6968144.832829] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 [6968144.832829] ata1.00: failed command: READ MULTIPLE [6968144.832829] ata1.00: cmd c4/00:20:fb:a5:df/00:00:00:00:00/ef tag 0 pio 16384 in [6968144.832829] ata1.00: status: { DRDY ERR } [6968144.832829] ata1.00: error: { UNC } [6968144.852104] ata1.00: configured for PIO0 [6968144.852125] ata1: EH complete (repeat above 7 lines several times) And again, the system almost completely freezes up, except that it still routes traffic through it in the meantime. It's easily reproducible by starting up MPD, which causes it quickly when it accesses the music in my main $HOME directory. (I don't use MPD on the laptop, so that's not the problem in any way.) Could this possibly be a bug in something besides the kernel? That might explain why the server started getting it despite not having a new kernel. And I'd like to know before I go out buying more HDDs. -- J. Randall Owens | http://www.ghiapet.net/
Attachment:
signature.asc
Description: OpenPGP digital signature
-- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel