Ok thanks. I tried to deal with overlays but I understood nothing of
what I was writing so it was getting dangerous (if things already gone
wrong when I typed simple and right commands, as I believed I was
understanding the situation, I wasn't going to take the risk of making
it more complicated for me).
The stackexchange statement is "the problem is that you don't know" so I
managed to be sure that his statement becomes fake (with an absolutely
identical configuration from the same OS and same date, and some blank
test too : default values were good)
* Version : 1.2
* Layout : left-symmetric
* Chunk Size : 512K
So as I was now sure to know, I played the command from my second mail,
and everything is back online, with my "RAID-VOLUME" ext4 filesystem
healthy and available.
Anyway, I'll take a copy of the below output for future, and mark serial
number position into the array (serial of /dev/sdd1, then /dev/sde1,
then /dev/sdb1) into a file which won't be in the array ;) Because this
is enough to be able to recreate the array later with right parameters
and right disk positions.
Every 1.0s: mdadm --detail /dev/md0 Pix-Server-Sorel:
Tue Apr 30 12:40:20 2019
/dev/md0:
Version : 1.2
Creation Time : Tue Apr 30 12:39:24 2019
Raid Level : raid5
Array Size : 15627788288 (14903.82 GiB 16002.86 GB)
Used Dev Size : 7813894144 (7451.91 GiB 8001.43 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Apr 30 12:39:25 2019
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : Pix-Server-Sorel:0 (local to host Pix-Server-Sorel)
UUID : 8b28c8aa:28bb7706:fd24a971:7fc464cc
Events : 1
Number Major Minor RaidDevice State
0 8 49 0 active sync /dev/sdd1
1 8 65 1 active sync /dev/sde1
2 8 17 2 active sync /dev/sdb1
Thanks for you answer Andreas, as I was waiting at least one expert look
before typing anything.
------------------------------------------------------------------------
Is it the right place to make some bug report for what happened here?
Because what caused this mess cannot be considered as normal :
* Add a prepared disk (unformatted partition) to the array (check
with mdadm --detail if it now appears as spare)
* grow the array to its current size +1 : then the array was stuck and
kind of "crashed".
I feel like nobody knows why this happened. And is this precise case,
nobody knew what should exactly be done - its a chance I noticed there
were absolutely no reshaping activity with bwm-ng. And there is some
others people online who asked for help for the same reason (but without
having 13.10 TiB of data into it) so the problem is recurrent.
Worst, documentation says to create a backup file to be sure it can be
recovered in case of incident in early stages, but doing so, it
completely messes the early stage with the Array and the backup is unusable.
Of course RAID will never protect from mistakes, but mdadm is not
supposed to be the mistake! This is why I believe something should be
understood and corrected about what causes this recurrent case.
For beginning with investigations, the only thing I did, this time, that
was different from others time, was to use "dd if=/dev/urandom
of=/dev/sdc bs=1024k status=progress" before preparing the disk, just to
be sure its complete surface is working. Do you think this can be the
cause of mdadm error? Should the new disk be always full of zeroes
before preparing partition, then add and grow ?
Best regards,
Julien ROBIN
On 4/30/19 11:23 AM, Andreas Klauer wrote:
On Tue, Apr 30, 2019 at 10:25:24AM +0200, Julien ROBIN wrote:
I'm about to play the following command :
mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdd1 /dev/sde1
/dev/sdb1 --assume-clean
Is it fine ?
If you must re-create
- use overlays
- see https://unix.stackexchange.com/a/131927/30851
Regards
Andreas Klauer