Re: Spurious HD convictions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Leslie,

According to some of the links here:
http://www.google.com/search?hl=en&q=failed+to+read+SCR+1+(Emask%3D0x40)

It seem to be either the Power Supply Unit (PSU) or the Port Multiplier (PM).

A quick workaround seem to be disabling NCQ on all affected devices.

On Sun, Dec 13, 2009 at 5:02 AM, lrhorer@xxxxxxxxxxx
<lrhorer@xxxxxxxxxxx> wrote:
>
>        What's happening here?  Suddenly, my backup server is suffering apparently
> spurious hard drive convictions.  The server is running RAID5 on 7 disks
> under md.  It has been running well for months, but suddenly it has started
> kicking drives from the array when under moderately heavy read or write
> loads.  The thing is, it isn't convicting any particular drive repeatedly,
> and the drives are not showing any errors under SMART.  This is a PM system,
> and I have tried changing the drive adapters, changing the PMs, changing
> cables, moving the drives around, and moving them out of the CPU enclosure to
> a new external chassis.  The convictions are not occurring on any one
> channel, over any one particular PM, or over any particular cable.  Since
> this started happening, I have been unable to get all the way through a
> resync before the array dumps at least one of the drives.  Here is a sample
> from the kernel log during one of the convictions:
>
> Dec 12 13:03:39 Backup kernel: [56319.397992] ata6.00: failed to read SCR 1
> (Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.397999] ata6.01: failed to
> read SCR 1 (Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.398001]
> ata6.02: failed to read SCR 1 (Emask=0x40) Dec 12 13:03:39 Backup kernel:
> [56319.398006] ata6.03: failed to read SCR 1 (Emask=0x40) Dec 12 13:03:39
> Backup kernel: [56319.398008] ata6.04: failed to read SCR 1 (Emask=0x40) Dec
> 12 13:03:39 Backup kernel: [56319.398010] ata6.05: failed to read SCR 1
> (Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.398014] ata6.15: exception
> Emask 0x4 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel:
> [56319.398018] ata6.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6
> frozen Dec 12 13:03:39 Backup kernel: [56319.398022] ata6.00: cmd
> ea/00:00:00:00:00/00:00:00:00:00/a0 tag 2
> Dec 12 13:03:39 Backup kernel: [56319.398023]          res
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> Dec 12 13:03:39 Backup kernel: [56319.398025] ata6.00: status: { DRDY } Dec 12
> 13:03:39 Backup kernel: [56319.398028] ata6.01: exception Emask 0x100 SAct
> 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398031]
> ata6.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12
> 13:03:39 Backup kernel: [56319.398034] ata6.03: exception Emask 0x100 SAct
> 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398037]
> ata6.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12
> 13:03:39 Backup kernel: [56319.398040] ata6.05: exception Emask 0x100 SAct
> 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398044]
> ata6.15: hard resetting link Dec 12 13:03:41 Backup kernel: [56321.597384]
> ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) Dec 12 13:03:41
> Backup kernel: [56321.597864] ata6.00: hard resetting link Dec 12 13:03:42
> Backup kernel: [56321.933843] ata6.00: SATA link up 3.0 Gbps (SStatus 123
> SControl 320) Dec 12 13:03:42 Backup kernel: [56321.933849] ata6.01: hard
> resetting link Dec 12 13:03:42 Backup kernel: [56322.294048] ata6.01: SATA
> link up 3.0 Gbps (SStatus 123 SControl 300) Dec 12 13:03:42 Backup kernel:
> [56322.294055] ata6.02: hard resetting link Dec 12 13:03:42 Backup kernel:
> [56322.642243] ata6.02: SATA link down (SStatus 0 SControl 320) Dec 12
> 13:03:42 Backup kernel: [56322.646087] ata6.03: hard resetting link Dec 12
> 13:03:43 Backup kernel: [56323.006393] ata6.03: SATA link up 3.0 Gbps
> (SStatus 123 SControl 300) Dec 12 13:03:43 Backup kernel: [56323.006400]
> ata6.04: hard resetting link Dec 12 13:03:43 Backup kernel: [56323.354708]
> ata6.04: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Dec 12 13:03:43
> Backup kernel: [56323.354714] ata6.05: hard resetting link Dec 12 13:03:43
> Backup kernel: [56323.690211] ata6.05: SATA link up 1.5 Gbps (SStatus 113
> SControl 320) Dec 12 13:03:43 Backup kernel: [56323.694555] ata6.00:
> configured for UDMA/100 Dec 12 13:03:43 Backup kernel: [56323.695732]
> ata6.01: configured for UDMA/100 Dec 12 13:03:44 Backup kernel:
> [56323.703212] ata6.03: configured for UDMA/100 Dec 12 13:03:44 Backup
> kernel: [56323.803119] ata6.04: configured for UDMA/100 Dec 12 13:03:44
> Backup kernel: [56323.803188] ata6: EH complete Dec 12 13:03:44 Backup
> kernel: [56323.803119] sd 5:0:0:0: [sde] 2930277168 512-byte hardware sectors
> (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde]
> Write Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd
> 5:0:0:0: [sde] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel:
> [56323.803119] sd 5:0:0:0: [sde] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd
> 5:1:0:0: [sdf] 2930277168 512-byte hardware sectors (1500302 MB) Dec 12
> 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Write Protect is off
> Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Mode Sense:
> 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf]
> Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 12
> 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] 2930277168 512-byte
> hardware sectors (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119]
> sd 5:3:0:0: [sdg] Write Protect is off Dec 12 13:03:44 Backup kernel:
> [56323.803119] sd 5:3:0:0: [sdg] Mode Sense: 00 3a 00 00 Dec 12 13:03:44
> Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Write cache: enabled, read
> cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel:
> [56323.803119] sd 5:4:0:0: [sdh] 625142448 512-byte hardware sectors (320073
> MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Write
> Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0:
> [sdh] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.803119]
> sd 5:4:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support
> DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde]
> 2930277168 512-byte hardware sectors (1500302 MB) Dec 12 13:03:44 Backup
> kernel: [56323.803119] sd 5:0:0:0: [sde] Write Protect is off Dec 12 13:03:44
> Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Mode Sense: 00 3a 00 00 Dec
> 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Write cache:
> enabled, read cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44
> Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] 2930277168 512-byte hardware
> sectors (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd
> 5:1:0:0: [sdf] Write Protect is off Dec 12 13:03:44 Backup kernel:
> [56323.803119] sd 5:1:0:0: [sdf] Mode Sense: 00 3a 00 00 Dec 12 13:03:44
> Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Write cache: enabled, read
> cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel:
> [56323.803119] sd 5:3:0:0: [sdg] 2930277168 512-byte hardware sectors
> (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg]
> Write Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd
> 5:3:0:0: [sdg] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel:
> [56323.803119] sd 5:3:0:0: [sdg] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd
> 5:4:0:0: [sdh] 625142448 512-byte hardware sectors (320073 MB) Dec 12
> 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Write Protect is off
> Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Mode Sense:
> 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.807115] sd 5:4:0:0: [sdh]
> Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 12
> 13:03:44 Backup kernel: [56323.839100] end_request: I/O error, dev sde,
> sector 10 Dec 12 13:03:44 Backup kernel: [56323.839100] md: super_written
> gets error=-5, uptodate=0 Dec 12 13:03:44 Backup kernel: [56323.839100]
> raid5: Disk failure on sde, disabling device.
> Dec 12 13:03:44 Backup kernel: [56323.839100] raid5: Operation continuing on 6
> devices.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux