Re: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive

Gavin Flower <gavinflower@xxxxxxxxx> · Thu, 14 Apr 2011 14:12:01 -0700 (PDT)

--- On Fri, 15/4/11, Phil Turmel <philip@xxxxxxxxxx> wrote:

> From: Phil Turmel <philip@xxxxxxxxxx>
> Subject: Re: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive
> To: "Gavin Flower" <gavinflower@xxxxxxxxx>
> Cc: "Mathias Burén" <mathias.buren@xxxxxxxxx>, neilb@xxxxxxx, linux-raid@xxxxxxxxxxxxxxx
> Date: Friday, 15 April, 2011, 1:16
> Hi Gavin,
> 
> I think you might want to investigate your *power supply*
> ...
> 
> On 04/13/2011 08:15 PM, Gavin Flower wrote:
> 
> [snip /]
> 
> > SMART Attributes Data Structure revision number: 10
> > Vendor Specific SMART Attributes with Thresholds:
> > ID# ATTRIBUTE_NAME         
> FLAG     VALUE WORST THRESH TYPE 
>     UPDATED  WHEN_FAILED RAW_VALUE
> >   1 Raw_Read_Error_Rate 
>    0x000f   115   099   006 
>   Pre-fail  Always   
>    -       87918991
> >   3 Spin_Up_Time     
>      
> 0x0003   099   097   000 
>   Pre-fail  Always   
>    -       0
> >   4 Start_Stop_Count   
>    
> 0x0032   085   085   020 
>   Old_age   Always   
>    -       16014
> >   5
> Reallocated_Sector_Ct   0x0033   100   100   036 
>   Pre-fail  Always   
>    -       0
> >   7 Seek_Error_Rate     
>    0x000f   072   060   030 
>   Pre-fail  Always   
>    -       20251386
> >   9 Power_On_Hours     
>    
> 0x0032   097   097   000 
>   Old_age   Always   
>    -       2940
> >  10 Spin_Retry_Count       
> 0x0013   100   100   097 
>   Pre-fail  Always   
>    -       0
> >  12 Power_Cycle_Count   
>    0x0032   093   093   020 
>   Old_age   Always   
>    -       7999
> 
> SMOKING GUN             
>                
>                
>                
>                
> ^^^^
> 
> I suspect your power supply is good enough to slowly spin
> up your drives and get them talking, but when you ask them
> to work hard, especially when writing, the PS voltage dips
> enough to reset the drive.
> 
> Look up all the power consumption specs for all of your
> components, and add up the *peak* current
> requirements.  Make sure your PS can handle it.
> 
> HTH,
> 
> Phil
> 

Hi Phil,

I was under the impression that I had an adequate power supply, so I checked all 5 drives.  In fact I made a table to compare all the smart entries.  The differences I thought were significant follow later.  I have the full comparison table, and the original smart output, in an OpenDocument file - which I will attach to a separate email (in case it gets blocked/dropped or some such).

Note that Power_Cycle_Count is anomalous only for /dev/sdc, so would this suggest cable problems?

I am not sure what to make of the other discrepancies.

Note that sda, sdb, sdd, & sde were bought and put in at the same time, while sdc was only obtained and inserted recently.

  sda      sdb      sdc      sdd      sde
  4 Start_Stop_Count
  720      716    16021    65535      713

  5 Reallocated_Sector_Ct
   17       42        0        1       79

  9 Power_On_Hours
12505    12500     2960    12405    12475

 12 Power_Cycle_Count
  720      716     7999      719      713

188 Command_Timeout
 1040        1        1        0        4

189 High_Fly_Writes
    1        0        0        0        0

Only /dev/sda has any errors logged, the 6th error occurred at disk power-on lifetime 12416 hours (517 days + 8 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 26 52 c2 0c

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  60 00 a8 97 51 c2 4c 00      00:07:58.408  READ FPDMA QUEUED

  60 00 00 3f 52 c2 4c 00      00:07:58.407  READ FPDMA QUEUED

  60 00 00 3f 53 c2 4c 00      00:07:58.407  READ FPDMA QUEUED

  60 00 28 3f 54 c2 4c 00      00:07:58.407  READ FPDMA QUEUED

  60 00 18 67 54 c2 4c 00      00:07:58.407  READ FPDMA QUEUED

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html