[PATCH] md/raid0: Provide admin guidance on multi-zone RAID0 layout migration

dann frazier <dann.frazier@xxxxxxxxxxxxx> · Tue, 29 Oct 2019 17:38:11 -0600

Helping an administrator understand this issue and how to deal with it
requires more text than achievable in a kernel error message. Let's
clarify the issue in the admin guide, and have the kernel emit a link
to it.

Note that this mentions a few current limitations of the mdadm tool. As
those get addressed, we should update the text to point to a recommended
minimum version of mdadm that addresses these issues.

Fixes: c84a1372df92 ("md/raid0: avoid RAID0 data corruption due to layout confusion.")
Cc: stable@xxxxxxxxxxxxxxx (3.14+)
Signed-off-by: dann frazier <dann.frazier@xxxxxxxxxxxxx>
---
 Documentation/admin-guide/md.rst | 44 ++++++++++++++++++++++++++++++++
 drivers/md/raid0.c               |  2 ++
 2 files changed, 46 insertions(+)

diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/md.rst
index 3c51084ffd379..4a37a50d51f97 100644
--- a/Documentation/admin-guide/md.rst
+++ b/Documentation/admin-guide/md.rst
@@ -759,3 +759,47 @@ These currently include:
 
   ppl_write_hint
       NVMe stream ID to be set for each PPL write request.
+
+Multi-Zone RAID0 Layout Migration
+----------------------
+An unintentional RAID0 layout change was introduced in the v3.14 kernel.
+This effectively means there are 2 different layouts Linux will use to
+write data to RAID0 arrays in the wild - the "pre-3.14" way and the
+"3.14 and later" way. Mixing these layouts by writing to an array while
+booted on these different kernel versions can lead to corruption.
+
+Note that this only impacts RAID0 arrays that include devices of different
+sizes. If your devices are all the same size, both layouts are equivalent,
+and your array is not at risk of corruption due to this issue.
+
+Unfortunately, the kernel cannot detect which layout was used for writes
+to pre-existing arrays, and therefore requires input from the
+administrator. This input can be provided via the kernel command line
+with the ``raid0.default_layout=<N>`` parameter, or by setting the
+``default_layout`` module parameter when loading the ``raid0`` module.
+You can also set the layout on a per-array basis using the ``layout``
+attribute of the array in the sysfs filesystem (but only when the array is
+stopped).
+
+Note that, as of this writing, ``mdadm`` requires ``raid0.default_layout``
+to be set when creating new multi-zone arrays as well. ``mdadm`` also does
+not yet have a way to store the layout type in the array itself. Until it
+does, either the ``default_layout`` parameter or per-array ``layout`` sysfs
+attributes need to be set on every boot.
+
+Which layout version should I use?
+++++++++++++++++++++++++++++++++++
+If your RAID array has only been written to by a 3.14 or later kernel, then
+you should specify version 2. If your kernel has only been written to by a
+< 3.14 kernel, then you should specify version 1. If the array may have
+already been written to by both kernels < 3.14 and >= 3.14, then it is
+possible that your data has already suffered corruption. Note that
+``mdadm --detail`` will show you when an array was created, which may be
+useful in helping determine the kernel version that was in-use at the time.
+
+When determining the scope of corruption, it may also be useful to know
+that the area susceptible to this corruption is limited to the area of the
+array after "MIN_DEVICE_SIZE * NUM DEVICES".
+
+For new arrays you may choose either 1 or 2 - neither layout version is
+inherently better than the other.
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 1e772287b1c8e..e01cd52d71aa4 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -155,6 +155,8 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
 		pr_err("md/raid0:%s: cannot assemble multi-zone RAID0 with default_layout setting\n",
 		       mdname(mddev));
 		pr_err("md/raid0: please set raid0.default_layout to 1 or 2\n");
+		pr_err("Read the following page for more information:\n");
+		pr_err("https://www.kernel.org/doc/html/latest/admin-guide/md.html#multi-zone-raid0-layout-migration\n";);
 		err = -ENOTSUPP;
 		goto abort;
 	}
-- 
2.24.0.rc1