Sorry for the delay
On 4/30/19 3:51 PM, Andreas Klauer wrote:
Something else must have happened... either a bug (are you using
latest mdadm / kernel versions?), or trying to store the backup
file on the raid itself, or maybe something blocking mdadm
(happened before with selinux/apparmor, I think)
or otherwise something interfering with the process.
If you can reproduce the issue you could investigate in more detail...
I have been able to reproduce the issue on the machine I used for
experiments last night.
It is running Debian 9 (last apt update/upgrade : mdadm 3.4-4+b1 and
linux 4.9.0-9-amd64 - exactly like the server) - Is debian including
your last bug fixes to his version ? I know that this is done for others
packages and I assume that the answer should be yes - but you probably
know more about it than me.
But even if the same failure happens some times, it's not all the times,
and even with big oneliners commands to test and test again, it shown
back only once. So I guess it's some kind of instability?
mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sde1
/dev/sdf1 /dev/sdc1
mdadm --grow --bitmap=internal /dev/md0
shred -n 1 -v /dev/sdd
gparted
mdadm --add /dev/md0 /dev/sdd1
mdadm --grow --raid-devices=4 --backup-file=/root/grow_md0.bak /dev/md0
And it was stuck exactly as my server last night. 3KB has been write on
each existing disk of the array, 1.5KB on the new one. Stuck at 0% and
unable to assemble after a stop or reboot.
Some details are maybe useless and were only to reproduce identical case
compared to the failure I had on the server :
-> Not sure about the importance of the drives order into the
apparition of this fault
-> Not sure about the importance of Intent Bitmap : Internal
-> Not sure about the importance of the shred then creation of
/dev/sdd1 partition before add and grow.
The only thing I'm sure about conditions is that it can happen even when
the volume isn't mounted into any directory (so it unlikely to be some
read/write access interfering during the procedure!)
On 5/1/19 1:39 AM, Wols Lists wrote:> On 30/04/19 09:25, Julien ROBIN wrote:
I'm about to play the following command :
mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdd1 /dev/sde1
/dev/sdb1 --assume-clean
Is it fine ?
You clearly haven't read the raid wiki
https://raid.wiki.kernel.org/index.php/Linux_Raid
It states NEVER EVER use mdadm --create on an existing array unless you
(or the person helping you) really knows what they are doing.
That's not exact, when I finally decided to pressed "enter" I was way
more sure than the wiki about what I was doing ;)
mdadm failed despite the fact I entirely read (and respected) the wiki
several times, longs times before, and last night too. I finally got the
information about :
* how to determine the correct parameters when you use --create
* what are the exact conditions that should be met to use --create to
reconfigure an access to an existing array (when mdadm or something else
blocked/misconfigured the array but data are still available in a
predictable way). Some others case aren't recoverable with --create.
I found most of those informations elsewhere on the Internet and by
doing my own tests - there is a lot of things that can be understood and
explained which would be useful into the wiki. Most interesting parts
are the cases in which --create can badly rewrite some data on a disk
(game over), and on which disk (so that in some of those cases, others
untouched disks may even be used to reconstruct the destroyed one using
--create with correct parameters - so that the game wasn't really over).
But unfortunately, all of those things aren't into the wiki.
Well, just to say that I knew what I was doing, as the wiki asked.
Another thing it says is always update to the latest mdadm. I don't
remember you telling us what version you're using, and the problem you
describe sounds very much like something I suspect has been fixed in the
latest versions.
Yes sorry for the delay, I forgot to say it into my previous posts - It
is running Debian 9 (last apt update/upgrade : mdadm 3.4-4+b1 and linux
4.9.0-9-amd64)
I guess that if it was a known bug, corrected some time ago, Debian
would have included the patch into Debian 9? If I'm true, it means that
the problem may still exist upstream. If I'm wrong, and Debian "mdadm"
version is unstable, I won't feel really comfortable using "master / sid
/ experimental" branches for "safety and stability" (that would be
really uncommon).
By the way, many thanks for your answers. Would be glad if my case
helped you to find something to improve into mdadm - sorry if not!
Best regards,
Julien