Raoul Bhatia [IPAX] wrote:
> James Bottomley wrote:
>> This is all normal. Seagate drives are known for throwing protocol
>> errors under stress at certain revs of firmware. That's what
>> REQ_TASK_ABORT, reason=0x6 is.
>>
>> Your logs indicate that the recovery occurred correctly (as in all tasks
>> were eventually retried), so it doesn't show an actual problem.
>
> ok, i already filed a trouble ticket at seagate - lets see if they
> provide a firmware update for the disks. afaik mine is "firmware 0002"
>
>>> sometimes even a disk is kicked out of the raid configuration.
>>
>> This would be abnormal, if you have a log of this, could you post it. I
>> assume it was because of I/O errors?
>
> i attached a bigger syslog file (.gz format).
>
> the errors look like:
>> syslog.1.gz:Mar 11 06:25:08 db-ipax-164 kernel: raid1: Disk failure on
>> sda1, disabling device. syslog.1.gz:Mar 11 06:25:01 db-ipax-164
>> kernel: raid10: Disk failure on sda7, disabling device.
>> syslog.1.gz:Mar 10 18:13:25 db-ipax-164 kernel: raid10: Disk failure
>> on sda3, disabling device. syslog.1.gz:Mar 10 18:13:23 db-ipax-164
>> kernel: raid10: Disk failure on sda9, disabling device.
>> syslog.1.gz:Mar 10 18:13:23 db-ipax-164 kernel: raid10: Disk failure
>> on sda8, disabling device. syslog.1.gz:Mar 10 18:13:23 db-ipax-164
>> kernel: raid10: Disk failure on sda5, disabling device. syslog.0:Mar
>> 18 18:30:48 db-ipax-164 kernel: raid10: Disk failure on sdd5,
>> disabling device. syslog.0:Mar 18 18:27:18 db-ipax-164 kernel: raid10:
>> Disk failure on sdd8, disabling device.
>
> i will test the device for itself to see if it has errors.
ok, the first thing i notice is, that smart reports a lot of errors.
> Device: SEAGATE ST373455SS Version: 0002
> Serial number: 3LQ2591D00009819ULUZ
> Device type: disk
> Transport protocol: SAS
> Local Time is: Thu Mar 20 20:15:45 2008 CET
> Device supports SMART and is Enabled
> Temperature Warning Enabled
> SMART Health Status: OK
> ...
> Error counter log:
> Errors Corrected by Total Correction
Gigabytes Total
> ECC rereads/ errors algorithm
processed uncorrected
> fast | delayed rewrites corrected invocations [10^9
bytes] errors
> read: 110937 0 0 110937 110937
170.275 0
> write: 0 0 0 0 0
187651578.045 0
i will try to upgrade to a new version of smartctl - maybe this will
reveal more information.
cheers,
raoul
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia@xxxxxxx
Technischer Leiter
IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office@xxxxxxx
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html