Re: RAID5 mdadm --grow wrote nothing (Reshape Status : 0% complete) and cannot assemble anymore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for the delay

On 4/30/19 3:51 PM, Andreas Klauer wrote:

Something else must have happened... either a bug (are you using
latest mdadm / kernel versions?), or trying to store the backup
file on the raid itself, or maybe something blocking mdadm
(happened before with selinux/apparmor, I think)
or otherwise something interfering with the process.

If you can reproduce the issue you could investigate in more detail...


I have been able to reproduce the issue on the machine I used for experiments last night.

It is running Debian 9 (last apt update/upgrade : mdadm 3.4-4+b1 and linux 4.9.0-9-amd64 - exactly like the server) - Is debian including your last bug fixes to his version ? I know that this is done for others packages and I assume that the answer should be yes - but you probably know more about it than me.


But even if the same failure happens some times, it's not all the times, and even with big oneliners commands to test and test again, it shown back only once. So I guess it's some kind of instability?

mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sde1 /dev/sdf1 /dev/sdc1
    mdadm --grow --bitmap=internal /dev/md0
    shred -n 1 -v /dev/sdd
    gparted
    mdadm --add /dev/md0 /dev/sdd1
    mdadm --grow --raid-devices=4 --backup-file=/root/grow_md0.bak /dev/md0

And it was stuck exactly as my server last night. 3KB has been write on each existing disk of the array, 1.5KB on the new one. Stuck at 0% and unable to assemble after a stop or reboot.

Some details are maybe useless and were only to reproduce identical case compared to the failure I had on the server : -> Not sure about the importance of the drives order into the apparition of this fault
 -> Not sure about the importance of Intent Bitmap : Internal
-> Not sure about the importance of the shred then creation of /dev/sdd1 partition before add and grow.


The only thing I'm sure about conditions is that it can happen even when the volume isn't mounted into any directory (so it unlikely to be some read/write access interfering during the procedure!)


On 5/1/19 1:39 AM, Wols Lists wrote:> On 30/04/19 09:25, Julien ROBIN wrote:
I'm about to play the following command :

mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdd1 /dev/sde1
/dev/sdb1 --assume-clean

Is it fine ?

You clearly haven't read the raid wiki
https://raid.wiki.kernel.org/index.php/Linux_Raid

It states NEVER EVER use mdadm --create on an existing array unless you
(or the person helping you) really knows what they are doing.



That's not exact, when I finally decided to pressed "enter" I was way more sure than the wiki about what I was doing ;)

mdadm failed despite the fact I entirely read (and respected) the wiki several times, longs times before, and last night too. I finally got the information about :
 * how to determine the correct parameters when you use --create
* what are the exact conditions that should be met to use --create to reconfigure an access to an existing array (when mdadm or something else blocked/misconfigured the array but data are still available in a predictable way). Some others case aren't recoverable with --create.

I found most of those informations elsewhere on the Internet and by doing my own tests - there is a lot of things that can be understood and explained which would be useful into the wiki. Most interesting parts are the cases in which --create can badly rewrite some data on a disk (game over), and on which disk (so that in some of those cases, others untouched disks may even be used to reconstruct the destroyed one using --create with correct parameters - so that the game wasn't really over).

But unfortunately, all of those things aren't into the wiki.
Well, just to say that I knew what I was doing, as the wiki asked.


Another thing it says is always update to the latest mdadm. I don't
remember you telling us what version you're using, and the problem you
describe sounds very much like something I suspect has been fixed in the
latest versions.

Yes sorry for the delay, I forgot to say it into my previous posts - It is running Debian 9 (last apt update/upgrade : mdadm 3.4-4+b1 and linux 4.9.0-9-amd64)

I guess that if it was a known bug, corrected some time ago, Debian would have included the patch into Debian 9? If I'm true, it means that the problem may still exist upstream. If I'm wrong, and Debian "mdadm" version is unstable, I won't feel really comfortable using "master / sid / experimental" branches for "safety and stability" (that would be really uncommon).


By the way, many thanks for your answers. Would be glad if my case helped you to find something to improve into mdadm - sorry if not!

Best regards,
Julien



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux