On Tue, Apr 13, 2010 at 3:54 PM, Mark Knecht <markknecht@xxxxxxxxx> wrote: > OK, I was messing around in the box today adding two more drives and I > probably hit a cable or something but maybe not. /dev/md3 was > effected, but md5 built on the same drives wasn't. Possibly this has > been there for a day or two and I didn't notice it. These drives are > only a few days old so I hope I'm not seeing some sort of early > problem. Supposedly good drives - WD 500GB RAID Edition. > > Currently all my RAIDs are RAID1 assembled by the kernel at boot time. > I have no mdadm.conf file. mdadm is a running daemon. > > From dmesg: > > md: considering sdb3 ... > md: adding sdb3 ... > md: adding sdc3 ... > md: adding sda3 ... > md: created md3 > md: bind<sda3> > md: bind<sdc3> > md: bind<sdb3> > md: running: <sdb3><sdc3><sda3> > md: kicking non-fresh sdb3 from array! > md: unbind<sdb3> > md: export_rdev(sdb3) > raid1: raid set md3 active with 2 out of 3 mirrors > md3: detected capacity change from 0 to 53694562304 > > How do I go about trying to /dev/sdb3 back into the array and what > sort of checking is advised when this happens before I add it back? > The bad drive (sdb) doesn't look much different than the good drives. > (sda shown, sdc) > > cruncher ~ # smartctl -A /dev/sdb > smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build) > Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF READ SMART DATA SECTION === > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail > Always - 0 > 3 Spin_Up_Time 0x0027 239 236 021 Pre-fail > Always - 1016 > 4 Start_Stop_Count 0x0032 100 100 000 Old_age > Always - 24 > 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail > Always - 0 > 7 Seek_Error_Rate 0x002e 200 200 000 Old_age > Always - 0 > 9 Power_On_Hours 0x0032 100 100 000 Old_age > Always - 87 > 10 Spin_Retry_Count 0x0032 100 253 000 Old_age > Always - 0 > 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age > Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age > Always - 22 > 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age > Always - 12 > 193 Load_Cycle_Count 0x0032 200 200 000 Old_age > Always - 11 > 194 Temperature_Celsius 0x0022 109 105 000 Old_age > Always - 38 > 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age > Always - 0 > 197 Current_Pending_Sector 0x0032 200 200 000 Old_age > Always - 0 > 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age > Offline - 0 > 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age > Always - 0 > 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age > Offline - 0 > > cruncher ~ # smartctl -A /dev/sda > smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build) > Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF READ SMART DATA SECTION === > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail > Always - 0 > 3 Spin_Up_Time 0x0027 239 235 021 Pre-fail > Always - 1016 > 4 Start_Stop_Count 0x0032 100 100 000 Old_age > Always - 24 > 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail > Always - 0 > 7 Seek_Error_Rate 0x002e 200 200 000 Old_age > Always - 0 > 9 Power_On_Hours 0x0032 100 100 000 Old_age > Always - 87 > 10 Spin_Retry_Count 0x0032 100 253 000 Old_age > Always - 0 > 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age > Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age > Always - 22 > 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age > Always - 11 > 193 Load_Cycle_Count 0x0032 200 200 000 Old_age > Always - 12 > 194 Temperature_Celsius 0x0022 108 106 000 Old_age > Always - 39 > 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age > Always - 0 > 197 Current_Pending_Sector 0x0032 200 200 000 Old_age > Always - 0 > 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age > Offline - 0 > 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age > Always - 0 > 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age > Offline - 0 > > cruncher ~ # > > Thanks, > Mark > So hopefully the process used below is basically correct. - Mark cruncher ~ # man mdadm cruncher ~ # cat /proc/mdstat Personalities : [raid0] [raid1] md11 : active raid0 sde1[1] sdd1[0] 104871936 blocks super 1.1 512k chunks md3 : active raid1 sdc3[2] sda3[0] 52436096 blocks [3/2] [U_U] md5 : active raid1 sdb5[1] sdc5[2] sda5[0] 52436032 blocks [3/3] [UUU] unused devices: <none> cruncher ~ # mdadm /dev/md3 -f /dev/sdb3 mdadm: set device faulty failed for /dev/sdb3: No such device cruncher ~ # mdadm /dev/md3 -r /dev/sdb3 mdadm: hot remove failed for /dev/sdb3: No such device or address cruncher ~ # fdisk /dev/sdb WARNING: DOS-compatible mode is deprecated. It's strongly recommended to switch off the mode (command 'c') and change display units to sectors (command 'u'). Command (m for help): p Disk /dev/sdb: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x703d11ba Device Boot Start End Blocks Id System /dev/sdb1 * 1 7 56196 83 Linux /dev/sdb2 8 530 4200997+ 82 Linux swap / Solaris /dev/sdb3 536 7063 52436160 fd Linux raid autodetect /dev/sdb4 7064 60801 431650485 5 Extended /dev/sdb5 7064 13591 52436128+ fd Linux raid autodetect Command (m for help): q cruncher ~ # mdadm /dev/md3 -a /dev/sdb3 mdadm: re-added /dev/sdb3 cruncher ~ # cat /proc/mdstat Personalities : [raid0] [raid1] md11 : active raid0 sde1[1] sdd1[0] 104871936 blocks super 1.1 512k chunks md3 : active raid1 sdb3[3] sdc3[2] sda3[0] 52436096 blocks [3/2] [U_U] [>....................] recovery = 1.3% (695488/52436096) finish=8.6min speed=99355K/sec md5 : active raid1 sdb5[1] sdc5[2] sda5[0] 52436032 blocks [3/3] [UUU] unused devices: <none> cruncher ~ # cat /proc/mdstat Personalities : [raid0] [raid1] md11 : active raid0 sde1[1] sdd1[0] 104871936 blocks super 1.1 512k chunks md3 : active raid1 sdb3[3] sdc3[2] sda3[0] 52436096 blocks [3/2] [U_U] [===========>.........] recovery = 56.3% (29540736/52436096) finish=5.0min speed=75950K/sec md5 : active raid1 sdb5[1] sdc5[2] sda5[0] 52436032 blocks [3/3] [UUU] unused devices: <none> cruncher ~ # cat /proc/mdstat Personalities : [raid0] [raid1] md11 : active raid0 sde1[1] sdd1[0] 104871936 blocks super 1.1 512k chunks md3 : active raid1 sdb3[1] sdc3[2] sda3[0] 52436096 blocks [3/3] [UUU] md5 : active raid1 sdb5[1] sdc5[2] sda5[0] 52436032 blocks [3/3] [UUU] unused devices: <none> cruncher ~ # -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html