Re: raid10 - won't rebuild - assigns all added disks as spares

NeilBrown <neilb@xxxxxxx> · Tue, 25 Nov 2014 13:28:10 +1100

On Tue, 25 Nov 2014 12:49:12 +1100 Jonathan Molyneux
<jonathan@xxxxxxxxxxxxxxxxxxxx> wrote:

> Hi Everyone,
> 
> Have a strange situation that hasn't happened before.
> Running Debian 7.7 with kernel version 3.2.63-2+deb7u1.
> Have a raid10 that runs the server (boot's off a raid1) that after 
> replacing a failed disk, just won't rebuild.
> 
> This is what it looks like without the disk (failed & removed):
> md1 : active raid10 sda2[6] sdc2[4] sdb2[1]
>        1952987136 blocks super 1.2 512K chunks 2 far-copies [4/3] [UUU_]
>        bitmap: 8/15 pages [32KB], 65536KB chunk
> 
> Then when the disk is added:
> md1 : active raid10 sdd2[5](S) sda2[6] sdc2[4] sdb2[1]
>        1952987136 blocks super 1.2 512K chunks 2 far-copies [4/3] [UUU_]
>        bitmap: 8/15 pages [32KB], 65536KB chunk
> 
> Nothing unusual is being spat out in dmesg.
> When removing the disk:
> [313434.073997] md: unbind<sdd2>
> [313434.138307] md: export_rdev(sdd2)
> When adding the disk:
> [313468.056484] md: bind<sdd2>
> 
> This is a strange one that I haven't had before.
> Any thoughts on how to kick the rebuild off without needing a reboot ?

I'm sure I've seen this bug before... and fixed it.
I don't remember the details and cannot find anything obvious in change logs.

You could try

   echo recover > /sys/block/md1/md/sync_action

Alternately, if you are re-adding a disk that had just been removed, you could

   mdadm /dev/md1 --remove /dev/sdd2
   mdadm --zero /dev/sdd2
   mdadm /dev/md1 --add /dev/sdd2

that will force a full recovery instead of just a bitmap-based recovery.
That will of course take longer than a bitmap-based recover, but seeing the
bitmap based recovery isn't starting, that could still be an improvement.

NeilBrown
Attachment:
pgpYC2yqhkHfE.pgp

Description: OpenPGP digital signature