Re: aic94xx driver woes continued

"Raoul Bhatia [IPAX]" <r.bhatia@xxxxxxx> · Thu, 20 Mar 2008 20:18:57 +0100

Raoul Bhatia [IPAX] wrote:
> James Bottomley wrote:
>> This is all normal.  Seagate drives are known for throwing protocol
>> errors under stress at certain revs of firmware.  That's what
>> REQ_TASK_ABORT, reason=0x6 is.
>>
>> Your logs indicate that the recovery occurred correctly (as in all tasks
>> were eventually retried), so it doesn't show an actual problem.
>
> ok, i already filed a trouble ticket at seagate - lets see if they
> provide a firmware update for the disks. afaik mine is "firmware 0002"
>
>>> sometimes even a disk is kicked out of the raid configuration.
>>
>> This would be abnormal, if you have a log of this, could you post it.  I
>> assume it was because of I/O errors?
>
> i attached a bigger syslog file (.gz format).
>
> the errors look like:
>> syslog.1.gz:Mar 11 06:25:08 db-ipax-164 kernel: raid1: Disk failure on
>> sda1, disabling device. syslog.1.gz:Mar 11 06:25:01 db-ipax-164
>> kernel: raid10: Disk failure on sda7, disabling device.
>> syslog.1.gz:Mar 10 18:13:25 db-ipax-164 kernel: raid10: Disk failure
>> on sda3, disabling device. syslog.1.gz:Mar 10 18:13:23 db-ipax-164
>> kernel: raid10: Disk failure on sda9, disabling device.
>> syslog.1.gz:Mar 10 18:13:23 db-ipax-164 kernel: raid10: Disk failure
>> on sda8, disabling device. syslog.1.gz:Mar 10 18:13:23 db-ipax-164
>> kernel: raid10: Disk failure on sda5, disabling device. syslog.0:Mar
>> 18 18:30:48 db-ipax-164 kernel: raid10: Disk failure on sdd5,
>> disabling device. syslog.0:Mar 18 18:27:18 db-ipax-164 kernel: raid10:
>> Disk failure on sdd8, disabling device.
>
> i will test the device for itself to see if it has errors.

ok, the first thing i notice is, that smart reports a lot of errors.

> Device: SEAGATE  ST373455SS       Version: 0002
> Serial number: 3LQ2591D00009819ULUZ
> Device type: disk
> Transport protocol: SAS
> Local Time is: Thu Mar 20 20:15:45 2008 CET
> Device supports SMART and is Enabled
> Temperature Warning Enabled
> SMART Health Status: OK
> ...
> Error counter log:
>            Errors Corrected by           Total   Correction 
Gigabytes    Total
>                ECC          rereads/    errors   algorithm 
processed    uncorrected
>            fast | delayed   rewrites  corrected  invocations   [10^9 
bytes]  errors
> read:     110937        0         0    110937     110937 
170.275           0
> write:         0        0         0         0          0 
187651578.045           0

i will try to upgrade to a new version of smartctl - maybe this will
reveal more information.

cheers,
raoul
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc.          email.          r.bhatia@xxxxxxx
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG         web.          http://www.ipax.at
Barawitzkagasse 10/2/2/11           email.            office@xxxxxxx
1190 Wien                           tel.               +43 1 3670030
FN 277995t HG Wien                  fax.            +43 1 3670030 15
____________________________________________________________________
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html