On 12/30/2009 12:16 PM, Tim Small wrote:
Raman Gupta wrote:
However, note that I can make the "exception Emask 0x0 SAct 0x0 SErr
0x0 action 0x6 frozen" happen even with the RAID array stopped and no
filesystems mounted. All I have to do is run the smartctl -a /dev/sdd
command (sdd is attached to the Marvell controller) repeatedly until
this exception occurs:
Dec 27 18:59:30 x kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr
0x0 action 0x6 frozen
Dec 27 18:59:30 x kernel: ata6.00: cmd
ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Dec 27 18:59:30 x kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/40
Emask 0x4 (timeout)
Dec 27 18:59:30 x kernel: ata6.00: status: { DRDY }
Dec 27 18:59:30 x kernel: ata6: hard resetting link
Dec 27 18:59:30 x kernel: ata6: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Dec 27 18:59:30 x kernel: ata6.00: configured for UDMA/133
Dec 27 18:59:30 x kernel: ata6: EH complete
Usually 10-15 executions is sufficient to replicate the issue.
Hmm. I wonder what running this script from this bug:
http://bugzilla.kernel.org/show_bug.cgi?id=14831
against drives attached to other controllers would do? It doesn't do
anything particularly special - just runs smartctl in a loop while also
writing to the same drive (via fs using dd).
Against a Seagate ST3500418AS on the Marvell controller, the script
produced the first "smartctl failed" error in 55 seconds. Within about
8 minutes, everything went to pot and all drives on that controller
were completely inaccessible (all filesystem writes failed and the
kernel could not IDENTIFY the drives). As far as I can tell with my
multimeter, voltages were stable.
Out of interest have you tried drives from other manufacturers?
Unfortunately, at the moment I don't have any non-Seagate drives
available.
Would also be interested to see what happens if you run the script
against the same drive, but attached to the ICH7?
The problem occurs against any of the three Seagate ST3500418AS drives
I have attached to the Marvell. Against the same model of drive
attached to my ICH7 controller, I canceled the script after it ran for
1.5 hours without any problems. So the problem appears to be exclusive
to the Marvell -- either the hardware or the driver.
Furthermore, over the last few days, I've had smartd and hddtemp
turned off for the Marvell drives, and they have been stable and
error-free.
Cheers,
Raman
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html