Re: raid1 becoming raid0 when device is removed before reboot

Guoqing Jiang <gqjiang@xxxxxxxx> · Fri, 31 Aug 2018 17:18:51 +0800

On 08/30/2018 10:32 AM, Niklas Hambüchen wrote:
Hi,

I'm taking this to the mailing list because I've failed to get a clear 
answer to this so far.

There are multiple reports floating around the Internet where users 
report that after a HD gets removed, destroyed and/or replaced and 
then the system is rebooted, raid5/raid1 configurations turn into 
raid0 as shown by mdadm after the reboot.

1. 
https://unix.stackexchange.com/questions/257599/raid5-array-reassembles-as-raid0
2. 
https://superuser.com/questions/117824/how-to-get-an-inactive-raid-device-working-again/118277#118277
3. 
http://fibrevillage.com/storage/676-how-to-fix-linux-mdadm-inactive-array

There's lots of head-scratching and inconclusiveness in there; people 
manually fix the situation and then go on without detail investigation.

I have observed this now on one server too, and decided to test it on 
my desktop where I have an mdadm RAID1 running:

Before the reboot, `mdadm --detail` tells me

     Raid Level : raid1

and the usual output.
Then I poweroff, unplug the SATA cable of one of the disks, power on, 
and get

     Raid Level : raid0

instead.

`cat /proc/mdstat` shows me something like:

  md0 : inactive sdc2[0](S)
        9766302720 blocks super 1.2

in that case.

This is opposed to what happens if I unplug an HD during operation 
without reboot, in which case I get the usual "degraded" [_U] output:

  md0 : active (auto-read-only) raid1 sde[2]
        2930135360 blocks super 1.2 [2/1] [_U]

(Versions used are Ubuntu 16.04, stock kernel 4.15.0-33-generic and 
stock mdadm v3.3)


To my knowledge, the output of raid level shows raid0 maybe caused by 
below in set_array_info.

        memset(&inf, 0, sizeof(inf));
        inf.major_version = info->array.major_version;
        inf.minor_version = info->array.minor_version;
        rv = md_set_array_info(mdfd, &inf);

And mdadm only calls two ioctls (SET_ARRAY_INFO and ADD_NEW_DISK) to the 
array during
the reboot stage.


Questions:

Is it expected that raid1 turns into raid0 in this way when during a 
reboot an expected device is not present (e.g. because it is unplugged 
or was replaced)?
If yes, what is the idea behind that, and why doesn't it go into the 
normal degraded mode instead?
Is it possible to achieve that? I had hoped that I would be able to 
continue booting into a degraded system if a disk fails during a 
reboot (and then be notified of the degradation by mdadm as usual), 
but this isn't the case if an array comes back as raid0 and inactive 
after reboot.
Finally, if these topics are already explained somewhere, where can I 
read more about it?

Maybe we need to call do_md_run when assembling an array, need to 
investigate it.

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 12e8fbee8fed..8516778ca650 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6757,6 +6757,7 @@ static int set_array_info(struct mddev *mddev, 
mdu_array_info_t *info)
                 * is some minimal configuration.
                 */
                mddev->ctime         = ktime_get_real_seconds();
+               do_md_run(mddev);
                return 0;
        }

Thanks,
Guoqing