Re: read errors corrected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Lots of tremendous responses. I appreciate it. I'm going to reply to
the first person who responded here, but this email should cover some
of the questions posed in further responses.

On Thu, Dec 30, 2010 at 00:24, Mikael Abrahamsson <swmike@xxxxxxxxx> wrote:
> On Thu, 30 Dec 2010, James wrote:
>
>> Can someone point me in the right direction?
>> (a) what causes these errors precisely?
>
> dmesg should give you information if this is SATA errors.

Here are some other logs that may be relevant:

Dec 15 15:40:34 nuova kernel: sd 0:0:0:0: [sda] Unhandled error code
Dec 15 15:40:34 nuova kernel: sd 0:0:0:0: [sda] Result: hostbyte=0x00
driverbyte=0x06
Dec 15 15:40:34 nuova kernel: sd 0:0:0:0: [sda] CDB: cdb[0]=0x28: 28
00 3b e3 53 ea 00 00 48 00
Dec 15 15:40:34 nuova kernel: end_request: I/O error, dev sda, sector 1004753898
Dec 15 15:40:34 nuova kernel: md/raid:md4: read error corrected (8
sectors at 974262528 on sda4)
Dec 15 15:40:34 nuova kernel: md/raid:md4: read error corrected (8
sectors at 974262536 on sda4)
Dec 15 15:40:34 nuova kernel: md/raid:md4: read error corrected (8
sectors at 974262544 on sda4)
Dec 15 15:40:34 nuova kernel: md/raid:md4: read error corrected (8
sectors at 974262552 on sda4)
Dec 15 15:40:34 nuova kernel: md/raid:md4: read error corrected (8
sectors at 974262560 on sda4)
Dec 15 15:40:34 nuova kernel: md/raid:md4: read error corrected (8
sectors at 974262568 on sda4)
Dec 15 15:40:34 nuova kernel: md/raid:md4: read error corrected (8
sectors at 974262576 on sda4)
Dec 15 15:40:34 nuova kernel: md/raid:md4: read error corrected (8
sectors at 974262584 on sda4)
Dec 15 15:40:34 nuova kernel: md/raid:md4: read error corrected (8
sectors at 974262592 on sda4)

Unfortunately I had not caught those error messages at first
glance...I/O error? Hrmm...doesn't sound good. The issue is repeated
later on.

Dec 29 03:04:01 nuova kernel: sd 1:0:1:0: [sdd] Unhandled error code
Dec 29 03:04:01 nuova kernel: sd 0:0:1:0: [sdb] Unhandled error code
Dec 29 03:04:01 nuova kernel: sd 0:0:1:0: [sdb] Result: hostbyte=0x00
driverbyte=0x06
Dec 29 03:04:01 nuova kernel: sd 0:0:1:0: [sdb] CDB: cdb[0]=0x28: 28
00 1b 06 d2 ea 00 00 78 00
Dec 29 03:04:01 nuova kernel: end_request: I/O error, dev sdb, sector 453432042
Dec 29 03:04:01 nuova kernel: sd 1:0:1:0: [sdd] Result: hostbyte=0x00
driverbyte=0x06
Dec 29 03:04:01 nuova kernel: sd 1:0:1:0: [sdd] CDB: cdb[0]=0x28: 28
00 1b 06 d2 62 00 00 88 00
Dec 29 03:04:01 nuova kernel: end_request: I/O error, dev sdd, sector 453431906
Dec 29 03:04:01 nuova kernel: raid5_end_read_request: 13 callbacks suppressed
Dec 29 03:04:01 nuova kernel: md/raid:md4: read error corrected (8
sectors at 422940552 on sdd4)
Dec 29 03:04:01 nuova kernel: md/raid:md4: read error corrected (8
sectors at 422940672 on sdb4)
Dec 29 03:04:01 nuova kernel: md/raid:md4: read error corrected (8
sectors at 422940680 on sdb4)
Dec 29 03:04:01 nuova kernel: md/raid:md4: read error corrected (8
sectors at 422940688 on sdb4)
Dec 29 03:04:01 nuova kernel: md/raid:md4: read error corrected (8
sectors at 422940696 on sdb4)
Dec 29 03:04:01 nuova kernel: md/raid:md4: read error corrected (8
sectors at 422940704 on sdb4)
Dec 29 03:04:01 nuova kernel: md/raid:md4: read error corrected (8
sectors at 422940712 on sdb4)
Dec 29 03:04:01 nuova kernel: md/raid:md4: read error corrected (8
sectors at 422940720 on sdb4)
Dec 29 03:04:01 nuova kernel: md/raid:md4: read error corrected (8
sectors at 422940728 on sdb4)
Dec 29 03:04:01 nuova kernel: md/raid:md4: read error corrected (8
sectors at 422940736 on sdb4)

Ouch.

>> (c) are these errors expected in a RAID array that is heavily used?
>
> No.
>
>> (d) what kind of errors should I see regarding "read errors" that
>> *would* indicate an imminent hardware failure?
>
> You should look into the SMART information on the drives using smartctl.

All of the drives indicate that the SMART status is
"passed"...unfortuantely this isn't very verbose. :)

Is there something specific I should be looking at in my SMART status?

I also see hundreds and hundreds of lines in my /var/log/messages that
indicates the following:


Dec 20 06:12:40 nuova smartd[22451]: Device: /dev/sda [SAT], SMART
Usage Attribute: 195 Hardware_ECC_Recovered changed from 47 to 46
Dec 20 07:12:39 nuova smartd[22451]: Device: /dev/sda [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 68
Dec 20 07:12:39 nuova smartd[22451]: Device: /dev/sda [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 33 to 32
Dec 20 07:12:40 nuova smartd[22451]: Device: /dev/sdd [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 68
Dec 20 07:12:40 nuova smartd[22451]: Device: /dev/sdd [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 33 to 32
Dec 20 07:42:40 nuova smartd[22451]: Device: /dev/sdd [SAT], SMART
Usage Attribute: 195 Hardware_ECC_Recovered changed from 46 to 45
Dec 20 08:12:39 nuova smartd[22451]: Device: /dev/sdb [SAT], SMART
Usage Attribute: 195 Hardware_ECC_Recovered changed from 42 to 41
Dec 20 08:42:39 nuova smartd[22451]: Device: /dev/sda [SAT], SMART
Usage Attribute: 195 Hardware_ECC_Recovered changed from 46 to 45
Dec 20 08:42:39 nuova smartd[22451]: Device: /dev/sdc [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 66 to 67
Dec 20 08:42:39 nuova smartd[22451]: Device: /dev/sdc [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 34 to 33
Dec 20 09:42:39 nuova smartd[22451]: Device: /dev/sdb [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 68
Dec 20 09:42:39 nuova smartd[22451]: Device: /dev/sdb [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 33 to 32
Dec 20 10:12:39 nuova smartd[22451]: Device: /dev/sdb [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 68 to 67
Dec 20 10:12:39 nuova smartd[22451]: Device: /dev/sdb [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 32 to 33
Dec 20 10:12:39 nuova smartd[22451]: Device: /dev/sdd [SAT], SMART
Usage Attribute: 195 Hardware_ECC_Recovered changed from 45 to 44
Dec 20 11:12:40 nuova smartd[22451]: Device: /dev/sdb [SAT], SMART
Usage Attribute: 195 Hardware_ECC_Recovered changed from 41 to 40
Dec 20 13:42:39 nuova smartd[22451]: Device: /dev/sdd [SAT], SMART
Usage Attribute: 195 Hardware_ECC_Recovered changed from 44 to 43
Dec 20 14:42:40 nuova smartd[22451]: Device: /dev/sdb [SAT], SMART
Usage Attribute: 195 Hardware_ECC_Recovered changed from 40 to 39
Dec 20 15:42:39 nuova smartd[22451]: Device: /dev/sda [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 68 to 67
Dec 20 15:42:39 nuova smartd[22451]: Device: /dev/sda [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 32 to 33
Dec 20 15:42:40 nuova smartd[22451]: Device: /dev/sdc [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 66
Dec 20 15:42:40 nuova smartd[22451]: Device: /dev/sdc [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 33 to 34
Dec 20 15:42:40 nuova smartd[22451]: Device: /dev/sdd [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 68 to 67
Dec 20 15:42:40 nuova smartd[22451]: Device: /dev/sdd [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 32 to 33
Dec 20 16:12:39 nuova smartd[22451]: Device: /dev/sda [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 68
Dec 20 16:12:39 nuova smartd[22451]: Device: /dev/sda [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 33 to 32
Dec 20 16:12:40 nuova smartd[22451]: Device: /dev/sdc [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 66 to 67
Dec 20 16:12:40 nuova smartd[22451]: Device: /dev/sdc [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 34 to 33
Dec 20 16:12:40 nuova smartd[22451]: Device: /dev/sdd [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 68
Dec 20 16:12:40 nuova smartd[22451]: Device: /dev/sdd [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 33 to 32
Dec 20 16:42:39 nuova smartd[22451]: Device: /dev/sda [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 68 to 67
Dec 20 16:42:39 nuova smartd[22451]: Device: /dev/sda [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 32 to 33
Dec 20 16:42:39 nuova smartd[22451]: Device: /dev/sdc [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 66
Dec 20 16:42:39 nuova smartd[22451]: Device: /dev/sdc [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 33 to 34
Dec 20 16:42:40 nuova smartd[22451]: Device: /dev/sdd [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 68 to 67
Dec 20 16:42:40 nuova smartd[22451]: Device: /dev/sdd [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 32 to 33
Dec 20 17:12:39 nuova smartd[22451]: Device: /dev/sda [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 68

Is it normal for SMART to update the attributes as the drives are
being used? (I've never had SMART installed before, so this is all
very new to me).

-james

> --
> Mikael Abrahamsson    email: swmike@xxxxxxxxx
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux