Re: First experience with drive being kicked

Mark Knecht <markknecht@xxxxxxxxx> · Tue, 13 Apr 2010 19:24:22 -0700

On Tue, Apr 13, 2010 at 3:54 PM, Mark Knecht <markknecht@xxxxxxxxx> wrote:
> OK, I was messing around in the box today adding two more drives and I
> probably hit a cable or something but maybe not. /dev/md3 was
> effected, but md5 built on the same drives wasn't. Possibly this has
> been there for a day or two and I didn't notice it.  These drives are
> only a few days old so I hope I'm not seeing some sort of early
> problem. Supposedly good drives - WD 500GB RAID Edition.
>
> Currently all my RAIDs are RAID1 assembled by the kernel at boot time.
> I have no mdadm.conf file. mdadm is a running daemon.
>
> From dmesg:
>
> md: considering sdb3 ...
> md:  adding sdb3 ...
> md:  adding sdc3 ...
> md:  adding sda3 ...
> md: created md3
> md: bind<sda3>
> md: bind<sdc3>
> md: bind<sdb3>
> md: running: <sdb3><sdc3><sda3>
> md: kicking non-fresh sdb3 from array!
> md: unbind<sdb3>
> md: export_rdev(sdb3)
> raid1: raid set md3 active with 2 out of 3 mirrors
> md3: detected capacity change from 0 to 53694562304
>
> How do I go about trying to /dev/sdb3 back into the array and what
> sort of checking is advised when this happens before I add it back?
> The bad drive (sdb) doesn't look much different than the good drives.
> (sda shown, sdc)
>
> cruncher ~ # smartctl -A /dev/sdb
> smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
> Always       -       0
>  3 Spin_Up_Time            0x0027   239   236   021    Pre-fail
> Always       -       1016
>  4 Start_Stop_Count        0x0032   100   100   000    Old_age
> Always       -       24
>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
> Always       -       0
>  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
> Always       -       0
>  9 Power_On_Hours          0x0032   100   100   000    Old_age
> Always       -       87
>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age
> Always       -       0
>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
> Always       -       22
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
> Always       -       12
> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age
> Always       -       11
> 194 Temperature_Celsius     0x0022   109   105   000    Old_age
> Always       -       38
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
> Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0030   100   253   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age
> Offline      -       0
>
> cruncher ~ # smartctl -A /dev/sda
> smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
> Always       -       0
>  3 Spin_Up_Time            0x0027   239   235   021    Pre-fail
> Always       -       1016
>  4 Start_Stop_Count        0x0032   100   100   000    Old_age
> Always       -       24
>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
> Always       -       0
>  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
> Always       -       0
>  9 Power_On_Hours          0x0032   100   100   000    Old_age
> Always       -       87
>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age
> Always       -       0
>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
> Always       -       22
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
> Always       -       11
> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age
> Always       -       12
> 194 Temperature_Celsius     0x0022   108   106   000    Old_age
> Always       -       39
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
> Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0030   100   253   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age
> Offline      -       0
>
> cruncher ~ #
>
> Thanks,
> Mark
>

So hopefully the process used below is basically correct.

- Mark

cruncher ~ # man mdadm
cruncher ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md11 : active raid0 sde1[1] sdd1[0]
      104871936 blocks super 1.1 512k chunks

md3 : active raid1 sdc3[2] sda3[0]
      52436096 blocks [3/2] [U_U]

md5 : active raid1 sdb5[1] sdc5[2] sda5[0]
      52436032 blocks [3/3] [UUU]

unused devices: <none>
cruncher ~ # mdadm /dev/md3 -f /dev/sdb3
mdadm: set device faulty failed for /dev/sdb3:  No such device
cruncher ~ # mdadm /dev/md3 -r /dev/sdb3
mdadm: hot remove failed for /dev/sdb3: No such device or address
cruncher ~ # fdisk /dev/sdb

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
         switch off the mode (command 'c') and change display units to
         sectors (command 'u').

Command (m for help): p

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x703d11ba

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1           7       56196   83  Linux
/dev/sdb2               8         530     4200997+  82  Linux swap / Solaris
/dev/sdb3             536        7063    52436160   fd  Linux raid autodetect
/dev/sdb4            7064       60801   431650485    5  Extended
/dev/sdb5            7064       13591    52436128+  fd  Linux raid autodetect

Command (m for help): q

cruncher ~ # mdadm /dev/md3 -a /dev/sdb3
mdadm: re-added /dev/sdb3
cruncher ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md11 : active raid0 sde1[1] sdd1[0]
      104871936 blocks super 1.1 512k chunks

md3 : active raid1 sdb3[3] sdc3[2] sda3[0]
      52436096 blocks [3/2] [U_U]
      [>....................]  recovery =  1.3% (695488/52436096)
finish=8.6min speed=99355K/sec

md5 : active raid1 sdb5[1] sdc5[2] sda5[0]
      52436032 blocks [3/3] [UUU]

unused devices: <none>
cruncher ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md11 : active raid0 sde1[1] sdd1[0]
      104871936 blocks super 1.1 512k chunks

md3 : active raid1 sdb3[3] sdc3[2] sda3[0]
      52436096 blocks [3/2] [U_U]
      [===========>.........]  recovery = 56.3% (29540736/52436096)
finish=5.0min speed=75950K/sec

md5 : active raid1 sdb5[1] sdc5[2] sda5[0]
      52436032 blocks [3/3] [UUU]

unused devices: <none>
cruncher ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md11 : active raid0 sde1[1] sdd1[0]
      104871936 blocks super 1.1 512k chunks

md3 : active raid1 sdb3[1] sdc3[2] sda3[0]
      52436096 blocks [3/3] [UUU]

md5 : active raid1 sdb5[1] sdc5[2] sda5[0]
      52436032 blocks [3/3] [UUU]

unused devices: <none>
cruncher ~ #
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html