Re: Recovery of RAID1 fails (added disks stays as spare)

NeilBrown <neilb@xxxxxxx> · Sat, 17 Aug 2013 10:50:37 +1000



On Thu, 15 Aug 2013 09:09:40 +0000 <Matthias.Blaesing@xxxxxxxxxx> wrote:

> Hello,
> 
> I'm currently fighting a server problem and have the feeling, that I'm running into walls.
> 
> Summary: On one of our servers we suffered from a hard disk error, that lead to a degraded array.
> The hardware was replaced and the array was rebuild. On one of the RAID-Sets the newly added
> disk is not activated but stays as spare.
> 
> System: SUSE Linux Enterprise Server 11 (x86_64) 11.2
> 
> The current state:
> 
> # cat /proc/mdstat
> 
> Personalities : [raid1] [raid0] [raid10] [raid6] [raid5] [raid4]
> md3 : active raid1 sda3[2](S) sdb3[0]
>       970888192 blocks [2/1] [U_]
> 
> md1 : active raid1 sda1[0] sdb1[1]
>       3911680 blocks [2/2] [UU]
> 
> unused devices: <none>
> 
> # mdadm --detail /dev/md3
> 
> /dev/md3:
>         Version : 0.90
>   Creation Time : Fri Feb  4 11:47:04 2011
>      Raid Level : raid1
>      Array Size : 970888192 (925.91 GiB 994.19 GB)
>   Used Dev Size : 970888192 (925.91 GiB 994.19 GB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 3
>     Persistence : Superblock is persistent
> 
>     Update Time : Thu Aug 15 10:22:07 2013
>           State : clean, degraded
>  Active Devices : 1
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 1
> 
>            UUID : e9d9c5f5:615c789e:3fb6082e:e5593158
>          Events : 0.18857541
> 
>     Number   Major   Minor   RaidDevice State
>        0       8       19        0      active sync   /dev/sdb3
>        1       0        0        1      removed
> 
>        2       8        3        -      spare   /dev/sda3
> 
> I would expect the raid system to move /dev/sda3 to number 1 and mark it as active.
> 
> Versions:
> 
> # uname -a
> Linux 3.0.58-0.6.6-default #1 SMP Tue Feb 19 11:07:00 UTC 2013 (1576ecd) x86_64 x86_64 x86_64 GNU/Linux
> # mdadm -V
> mdadm - v3.2.2 - 17th June 2011
> 
> I tried:
> 
> * removing /dev/sda3 from the array and add it back
> * removing /dev/sda3 from the array, zero the root block and add it back (--zero-superblock)
> * removing /dev/sda3 from the array, reduce raid devices to one, add /dev/sda3 back
> * removing /dev/sda3 from the array, zero the first part of the disk (with dd) and add it back
> 
> I would really appreciate ideas how to fix this (preferably while running the system).
> 

Strange.  I would definitely have expected one of those to start the recovery.
Does anything appear in the kernel logs (e.g. output of 'dmesg')?
What does
  grep . /sys/block/md3/md/*
show?
I don't suppose
  echo recover > /sys/block/md3/md/sync_action
helps?
Is there still a kernel thread called
    md3_raid1
running?

NeilBrown
Attachment:
signature.asc

Description: PGP signature