--- On Fri, 15/4/11, Phil Turmel <philip@xxxxxxxxxx> wrote: > From: Phil Turmel <philip@xxxxxxxxxx> > Subject: Re: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive > To: "Gavin Flower" <gavinflower@xxxxxxxxx> > Cc: "Mathias Burén" <mathias.buren@xxxxxxxxx>, neilb@xxxxxxx, linux-raid@xxxxxxxxxxxxxxx > Date: Friday, 15 April, 2011, 1:16 > Hi Gavin, > > I think you might want to investigate your *power supply* > ... > > On 04/13/2011 08:15 PM, Gavin Flower wrote: > > [snip /] > > > SMART Attributes Data Structure revision number: 10 > > Vendor Specific SMART Attributes with Thresholds: > > ID# ATTRIBUTE_NAME > FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > > 1 Raw_Read_Error_Rate > 0x000f 115 099 006 > Pre-fail Always > - 87918991 > > 3 Spin_Up_Time > > 0x0003 099 097 000 > Pre-fail Always > - 0 > > 4 Start_Stop_Count > > 0x0032 085 085 020 > Old_age Always > - 16014 > > 5 > Reallocated_Sector_Ct 0x0033 100 100 036 > Pre-fail Always > - 0 > > 7 Seek_Error_Rate > 0x000f 072 060 030 > Pre-fail Always > - 20251386 > > 9 Power_On_Hours > > 0x0032 097 097 000 > Old_age Always > - 2940 > > 10 Spin_Retry_Count > 0x0013 100 100 097 > Pre-fail Always > - 0 > > 12 Power_Cycle_Count > 0x0032 093 093 020 > Old_age Always > - 7999 > > SMOKING GUN > > > > > ^^^^ > > I suspect your power supply is good enough to slowly spin > up your drives and get them talking, but when you ask them > to work hard, especially when writing, the PS voltage dips > enough to reset the drive. > > Look up all the power consumption specs for all of your > components, and add up the *peak* current > requirements. Make sure your PS can handle it. > > HTH, > > Phil > Hi Phil, I was under the impression that I had an adequate power supply, so I checked all 5 drives. In fact I made a table to compare all the smart entries. The differences I thought were significant follow later. I have the full comparison table, and the original smart output, in an OpenDocument file - which I will attach to a separate email (in case it gets blocked/dropped or some such). Note that Power_Cycle_Count is anomalous only for /dev/sdc, so would this suggest cable problems? I am not sure what to make of the other discrepancies. Note that sda, sdb, sdd, & sde were bought and put in at the same time, while sdc was only obtained and inserted recently. sda sdb sdc sdd sde 4 Start_Stop_Count 720 716 16021 65535 713 5 Reallocated_Sector_Ct 17 42 0 1 79 9 Power_On_Hours 12505 12500 2960 12405 12475 12 Power_Cycle_Count 720 716 7999 719 713 188 Command_Timeout 1040 1 1 0 4 189 High_Fly_Writes 1 0 0 0 0 Only /dev/sda has any errors logged, the 6th error occurred at disk power-on lifetime 12416 hours (517 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 26 52 c2 0c Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 a8 97 51 c2 4c 00 00:07:58.408 READ FPDMA QUEUED 60 00 00 3f 52 c2 4c 00 00:07:58.407 READ FPDMA QUEUED 60 00 00 3f 53 c2 4c 00 00:07:58.407 READ FPDMA QUEUED 60 00 28 3f 54 c2 4c 00 00:07:58.407 READ FPDMA QUEUED 60 00 18 67 54 c2 4c 00 00:07:58.407 READ FPDMA QUEUED -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html