Re: Unable to re-add a disk after a reboot.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 14 Aug 2014 18:08:30 -0500 Ram Ramesh <rramesh2400@xxxxxxxxx> wrote:

> Hi,
> 
>    I just finished converting a 3-disk raid5 to 4-disk raid6. After a 
> reboot to start clean, I noticed that one of the disk (the new one I 
> just added) was missing in /proc/partitions. This was disk 4 in my 
> /dev/md0. Assuming some cable issue, I powered off, wiggled the cables 
> and restarted and the device was found by kernel. However, md0 shows 
> device missing and array degraded
> 
>     lata [rramesh] 280 > cat /proc/mdstat
>     Personalities : [raid6] [raid5] [raid4]
>     md0 : active raid6 sdb1[0] sdd1[3] sdc1[1]
>            3906763776 blocks super 1.2 level 6, 512k chunk, algorithm 2
>     [4/3] [UUU_]
> 
>     unused devices: <none>
> 
> However my attempt to --re-add does not work.
> 
>     lata [rramesh] 277 > sudo mdadm /dev/md0 --verbose --re-add /dev/sde1
>     mdadm: --re-add for /dev/sde1 to /dev/md0 is not possible

"re-add" only makes sense when you have a write-indent bitmap which you don't
have.
So you need to "--add" which marks the device as a spare and then starts a
complete rebuild.


> I checked the SMART and it shows a lot of reallocated_sector_ct errors 
> also. So, the disk is dying, but I am not able understand why mdadm 
> would not add.

It will "add".  It just wont "re-add".

NeilBrown


> 
>     SMART Attributes Data Structure revision number: 16
>     Vendor Specific SMART Attributes with Thresholds:
>     ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE     
>     UPDATED  WHEN_FAILED RAW_VALUE
>        1 Raw_Read_Error_Rate     0x000b   091   091   016 Pre-fail 
>     Always       -       53
>        2 Throughput_Performance  0x0005   100   100   054 Pre-fail 
>     Offline      -       0
>        3 Spin_Up_Time            0x0007   135   135   024 Pre-fail 
>     Always       -       426 (Average 425)
>        4 Start_Stop_Count        0x0012   100   100   000 Old_age  
>     Always       -       59
>     *5 Reallocated_Sector_Ct   0x0033   001   001   005 Pre-fail 
>     Always   FAILING_NOW 330*
>        7 Seek_Error_Rate         0x000b   098   098   067 Pre-fail 
>     Always       -       2
>        8 Seek_Time_Performance   0x0005   100   100   020 Pre-fail 
>     Offline      -       0
>        9 Power_On_Hours          0x0012   100   100   000 Old_age  
>     Always       -       3445
>       10 Spin_Retry_Count        0x0013   100   100   060 Pre-fail 
>     Always       -       0
>       12 Power_Cycle_Count       0x0032   100   100   000 Old_age  
>     Always       -       59
>     192 Power-Off_Retract_Count 0x0032   100   100   000 Old_age  
>     Always       -       548
>     193 Load_Cycle_Count        0x0012   100   100   000 Old_age  
>     Always       -       548
>     194 Temperature_Celsius     0x0002   153   153   000 Old_age  
>     Always       -       39 (Min/Max 21/43)
>     196 Reallocated_Event_Count 0x0032   001   001   000 Old_age  
>     Always       -       17604
>     197 Current_Pending_Sector  0x0022   001   001   000 Old_age  
>     Always       -       13256
>     198 Offline_Uncorrectable   0x0008   100   100   000 Old_age  
>     Offline      -       0
>     199 UDMA_CRC_Error_Count    0x000a   200   200   000 Old_age  
>     Always       -       0
> 
> Any recommendations while I am waiting to get a replacement.
> 
> Ramesh
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux