Re: admin-guide page for raid0 layout issue

dann frazier <dann.frazier@xxxxxxxxxxxxx> · Mon, 28 Oct 2019 13:23:26 -0600

On Fri, Oct 25, 2019 at 04:07:50PM -0700, Song Liu wrote:
> On Fri, Oct 25, 2019 at 12:16 PM dann frazier
> <dann.frazier@xxxxxxxxxxxxx> wrote:
> >
> > hey,
> >   I recently hit the raid0 default_layout issue and found the warning
> > message to be lacking (and google hits show that I'm not the
> > only one). It wasn't clear to me that it was referring to a kernel
> > parameter, also wasn't clear how I should decide which setting to use,
> > and definitely not clear that picking the wrong one could *cause
> > corruption* (which, AIUI, is the case). What are your thoughts on
> > adding a more admin-friendly explanation/steps on resolving the
> > problem to the admin-guide, and include a URL to it in the warning
> > message? As prior art:
> >
> >   https://github.com/torvalds/linux/commit/6a012288d6906fee1dbc244050ade1dafe4a9c8d
> >
> > If the idea has merit, let me know if you'd like me to take a stab at
> > a strawdog.
> 
> I think it is good to provide some more information for the admin.
> But I am not sure whether we need to point to a url, or just put
> everything in the message.
> 
> Please feel free to try add this information and submit patches.

Hi Song,
  Here's an RFC of what I'm thinking - but drafting this has led me to
more questions:

 - It seems like we've broken backwards compatibility when creating
   new multi-zone arrays. I expected that mdadm would still let me
   create an array w/o a kernel cmdline option in-effect, and w/o
   specifying a specific layout, and just choose a default.
   But even w/ latest mdadm git, it doesn't:
      
    $ sudo ./mdadm --create /dev/md0 --run --metadata=default \
      --homehost=akis --level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1
    mdadm: /dev/vdb1 appears to be part of a raid array:
           level=raid0 devices=2 ctime=Mon Oct 28 18:55:38 2019
    mdadm: /dev/vdc1 appears to be part of a raid array:
           level=raid0 devices=2 ctime=Mon Oct 28 18:55:38 2019
    mdadm: RUN_ARRAY failed: Unknown error 524

  It also doesn't let me specify a layout:
    $ sudo ./mdadm --create /dev/md0 --run --metadata=default \
      --homehost=akis --level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1 --layout=2
    mdadm: layout not meaningful for raid0 arrays.

 - Once you've decided on a layout to use, how do you make that a property
   of the array itself, and therefore avoid having to set default_layout 
   on any machine that uses it? Seems like we'd want to "bake that in"
   to the array itself. I expected that if I created a new array on a
   recent kernel that the array would remember the layout. But after
   creating a new array on 5.4-rc5 w/ default_layout set, I rebooted
   w/o a raid0.default_layout, and the kernel still refused to start
   it.

   Note: I realize my observation above does not match the text below
   in the 3rd paragraph, but I just left it for now until I learn the
   "right way".

diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/md.rst
index 3c51084ffd379..5364c514d7926 100644
--- a/Documentation/admin-guide/md.rst
+++ b/Documentation/admin-guide/md.rst
@@ -759,3 +759,37 @@ These currently include:
 
   ppl_write_hint
       NVMe stream ID to be set for each PPL write request.
+
+Multi-Zone RAID0 Layout Migration
+----------------------
+An unintentional RAID0 layout change was introduced in the v3.14 kernel.
+This effectively means there are 2 different layouts Linux will use to
+write data to RAID0 arrays in the wild - the "pre-3.14" way and the
+"post-3.14" way. Mixing these layouts by writing to an array while booted
+on these different kernel versions can lead to corruption.
+
+Note that this only impacts RAID0 arrays that include devices of different
+sizes. If your devices are all the same size, both layouts are equivalent,
+and your array is not at risk of corruption due to this issue.
+
+The kernel has since been updated to record a layout version when creating
+new arrays. Unfortunately, the kernel can not detect which layout was used
+for writes to pre-existing arrays, and therefore requires input from the
+administrator. This input can be provided via the
+``raid0.default_layout=<N>`` kernel command line parameter, or via the
+``layout`` attribute in the sysfs filesystem (but only when the array is
+stopped).
+
+Which layout version should I use?
+++++++++++++++++++++++++++++++++++
+If your RAID array has only been written to by a >= 3.14 kernel, then you
+should specify version 2. If your kernel has only been written to by a < 3.14
+kernel, then you should specify version 1. If the array may have already been
+written to by both kernels < 3.14 and >= 3.14, then it is possible that your
+data has already suffered corruption. Note that ``mdadm --detail`` will show
+you when an array was created, which may be useful in helping determine the
+kernel version that was in-use at the time.
+
+When determining the scope of corruption, it maybe also be useful to know
+that the area susceptible to this corruption is limited to the area of the
+array after "MIN_DEVICE_SIZE * NUM DEVICES".
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 1e772287b1c8e..e01cd52d71aa4 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -155,6 +155,8 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
 		pr_err("md/raid0:%s: cannot assemble multi-zone RAID0 with default_layout setting\n",
 		       mdname(mddev));
 		pr_err("md/raid0: please set raid0.default_layout to 1 or 2\n");
+		pr_err("Read the following page for more information:\n");
+		pr_err("https://www.kernel.org/doc/html/latest/admin-guide/md.html#multi-zone-raid0-layout-migration\n";);
 		err = -ENOTSUPP;
 		goto abort;
 	}