Hi Tejun, I tested your patch. No more EH alarms in dmesg during an overnight test. One question I have is whether this fix will mask genuine failure. Thanks, Fajun On 8/24/06, Tejun Heo <htejun@xxxxxxxxx> wrote:
Tejun Heo wrote: > Fajun Chen wrote: >> Sil3124. That's the only chipset we use. > >>> > [30540.003174] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action >>> > 0x2 frozen >>> > [30540.003259] ata1: (irq_stat 0x01100010, PHY RDY changed) > > Yeap, this message from sata_sil24. You're not getting any phy status > changes bits in SError although the device is reporting phy rdy changed > event. However, your 3124 is reporting 8b/10b decoding error threshold > exceeded error interrupt. That could be related to the phyrdy status > changed event. This happens only under heavy IO, right? How often does > it occur in units of times per megabytes transferred? > > 8b/10b error is a recoverable FIS reception error. The interrupt bit > (bit 24 of irq_stat) is only turned on if threshold count is exceeded, > which is initialized to 0x8000 at the moment. This indicates that there > are quite some number of transmission failures. > Sorry, I forgot to attach patch. Can you please try the attached patch? -- tejun --- a/drivers/scsi/sata_sil24.c +++ b/drivers/scsi/sata_sil24.c @@ -1034,9 +1034,9 @@ static void sil24_init_controller(struct writel(PORT_CS_IRQ_WOC, port + PORT_CTRL_CLR); /* Zero error counters. */ - writel(0x8000, port + PORT_DECODE_ERR_THRESH); - writel(0x8000, port + PORT_CRC_ERR_THRESH); - writel(0x8000, port + PORT_HSHK_ERR_THRESH); + writel(0x0000, port + PORT_DECODE_ERR_THRESH); + writel(0x0000, port + PORT_CRC_ERR_THRESH); + writel(0x0000, port + PORT_HSHK_ERR_THRESH); writel(0x0000, port + PORT_DECODE_ERR_CNT); writel(0x0000, port + PORT_CRC_ERR_CNT); writel(0x0000, port + PORT_HSHK_ERR_CNT);
- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html