(I wanted to react to the thread "admin-guide page for raid0 layout issue", but I just registered and I don't know how to respond to
existing messages.)
I would like to make some suggestions regarding the recent raid0 layout patch, as it made my system unbootable, and it took me quite some
time to figure out what was wrong and how to fix it. I also encountered confusion on the web. I am just a regular user, not a programmer or
linux guru, so take my suggestions as such.
* Everywhere where the values are documented, all three of 0, 1, and 2 should be explicitly documented (not only two of them). If I am not
mistaken, 0 means "unset", 1 means "old layout" (kernel 3.14 and older), 2 means "new layout" (3.15 and later).
* When trying to assemble existing array but without the kernel parameter set (i.e. set to 0) it silently fails. Only in the kernel ring
buffer there is a message:
md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting
md/raid0: please set raid.default_layout to 1 or 2
When trying to create a raid0 array, it gives an error, but it is not helpful:
mdadm: Defaulting to version 1.2 metadata
mdadm: RUN_ARRAY failed: Unknown error 524
For both cases, and both places (mdamd and dmesg) should be more informative.
* The recommended parameter value for new raid0 arrays should be made clear. I guess it's 2.
* Various places where documentation could (or should) be added:
- mdamd error messags
- kernel ring buffer messages
- mdadm man page
- mdadm wiki
- kernel parameter documentation pages
Confusions:
* The definition of the parameter values is wrong in the patch description:
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9#diff-158c54ea7ccae01a77ae3f5d44ab0f94 it says 0 is old, 1
is new. Please fix, because this contributes to confusion, and may even lead to data corruption.
* On the raid mailing list https://www.spinics.net/lists/raid/msg63337.html someone said "new (1) and old (2) vs. unset (0)". No one
objected, but I guess that this is also wrong?
* Two webpages (of the rare ones on this issue) are conflicting on what is the meaning of parameter 1 and 2.
https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ says 1 is old, 2 is new.
https://www.reddit.com/r/linuxquestions/comments/debx7w/mdadm_raid0_default_layout/ says 2 is old, 1 is new.
* https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ suggests that the kernel parameters should be set in GRUB
as GRUB_CMDLINE_LINUX_DEFAULT="raid0.default_layout=2" (or 1), but in my opinion it should set GRUB_CMDLINE_LINUX_DEFAULT because
GRUB_CMDLINE_LINUX_DEFAULT is not used in recovery mode, but GRUB_CMDLINE_LINUX is. So, please document all possible (recommended) ways to
set the parameter: GRUB, /etc/modprobe.d/00-local.conf, and /sys/module/raid0/parameters/default_layout.
* I was also wondering why the patch had to disable assembling if it was a working array on my system. Isn't it obvious, based on the kernel
version with which it worked before the update, whether it should be 1 or 2? Why wasn't it possible to first automatically set the default
kernel variable in grub.cfg and then do the update?
* Why is this parameter actually a *kernel* parameter. While not very likely, it is possible that two arrays with different layouts (needing
different parameter settings) will end up in the same machine. In such a case any parameter choice may lead to data corruption. I would
think that the layout parameter is a property of the specific array, so it should be in the meta-data of the array itself.