Re: need help with ata error

Tejun Heo <htejun@xxxxxxxxx> · Fri, 09 Feb 2007 05:37:26 -0500

[cc'ing Mikael Pettersson, hi!]

Eyal Lebedinsky wrote:
I recently added a 6th disk to a RAID5. All disks are WD 320GB SATA, of different
Caviar models (SE, RE) and this new one is RE16.

It worked well for about 5 days (completed a 20 hour grow OK). I now see the following
messages logged (see at end). Can someone explain what it means? The raid5 is still
up and it did not react to this. Being a mythtv repository it gets used regularly.

Is this a disk issue? A controller issue (the new disk is now the fourth on a
Promise SATA-II-150-TX4)? A kernel problem (2.6.20 vanilla).

ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata6.00: cmd 25/00:b8:3f:c4:b6/00:00:20:00:00/e0 tag 0 cdb 0x0 data 94208 in
         res 50/00:00:f6:c4:b6/00:00:00:00:00/e0 Emask 0x1 (device error)

Device error w/o ATA_ERR set?  Mikael, this seems coming from 
PDC_ERR_MASK test in pdc_host_intr().  AC_ERR_DEV means 'the attached 
ATA/ATAPI device indicated error condition', so it isn't really 
appropriate there nor is pdc_reset_port() in IRQ handler.  I guess this 
is from the old EH days.

Unknown errors can use AC_ERR_OTHER which will be automatically cleared 
if error diagnosis results in any real error mask.  I think what should 
be done here is recording irq mask using ata_ehi_push_desc() and setting 
specific AC_ERR_* according to the IRQ mask as ahci and sata_sil24 do.

Eyal, if the error doesn't repeat, you can ignore it.  It probably is a 
transient transmission problem, power fluctuation or whatever.

Thanks.

--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html