On Fri, Oct 25, 2019 at 04:07:50PM -0700, Song Liu wrote: > On Fri, Oct 25, 2019 at 12:16 PM dann frazier > <dann.frazier@xxxxxxxxxxxxx> wrote: > > > > hey, > > I recently hit the raid0 default_layout issue and found the warning > > message to be lacking (and google hits show that I'm not the > > only one). It wasn't clear to me that it was referring to a kernel > > parameter, also wasn't clear how I should decide which setting to use, > > and definitely not clear that picking the wrong one could *cause > > corruption* (which, AIUI, is the case). What are your thoughts on > > adding a more admin-friendly explanation/steps on resolving the > > problem to the admin-guide, and include a URL to it in the warning > > message? As prior art: > > > > https://github.com/torvalds/linux/commit/6a012288d6906fee1dbc244050ade1dafe4a9c8d > > > > If the idea has merit, let me know if you'd like me to take a stab at > > a strawdog. > > I think it is good to provide some more information for the admin. > But I am not sure whether we need to point to a url, or just put > everything in the message. > > Please feel free to try add this information and submit patches. Hi Song, Here's an RFC of what I'm thinking - but drafting this has led me to more questions: - It seems like we've broken backwards compatibility when creating new multi-zone arrays. I expected that mdadm would still let me create an array w/o a kernel cmdline option in-effect, and w/o specifying a specific layout, and just choose a default. But even w/ latest mdadm git, it doesn't: $ sudo ./mdadm --create /dev/md0 --run --metadata=default \ --homehost=akis --level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1 mdadm: /dev/vdb1 appears to be part of a raid array: level=raid0 devices=2 ctime=Mon Oct 28 18:55:38 2019 mdadm: /dev/vdc1 appears to be part of a raid array: level=raid0 devices=2 ctime=Mon Oct 28 18:55:38 2019 mdadm: RUN_ARRAY failed: Unknown error 524 It also doesn't let me specify a layout: $ sudo ./mdadm --create /dev/md0 --run --metadata=default \ --homehost=akis --level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1 --layout=2 mdadm: layout not meaningful for raid0 arrays. - Once you've decided on a layout to use, how do you make that a property of the array itself, and therefore avoid having to set default_layout on any machine that uses it? Seems like we'd want to "bake that in" to the array itself. I expected that if I created a new array on a recent kernel that the array would remember the layout. But after creating a new array on 5.4-rc5 w/ default_layout set, I rebooted w/o a raid0.default_layout, and the kernel still refused to start it. Note: I realize my observation above does not match the text below in the 3rd paragraph, but I just left it for now until I learn the "right way". diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/md.rst index 3c51084ffd379..5364c514d7926 100644 --- a/Documentation/admin-guide/md.rst +++ b/Documentation/admin-guide/md.rst @@ -759,3 +759,37 @@ These currently include: ppl_write_hint NVMe stream ID to be set for each PPL write request. + +Multi-Zone RAID0 Layout Migration +---------------------- +An unintentional RAID0 layout change was introduced in the v3.14 kernel. +This effectively means there are 2 different layouts Linux will use to +write data to RAID0 arrays in the wild - the "pre-3.14" way and the +"post-3.14" way. Mixing these layouts by writing to an array while booted +on these different kernel versions can lead to corruption. + +Note that this only impacts RAID0 arrays that include devices of different +sizes. If your devices are all the same size, both layouts are equivalent, +and your array is not at risk of corruption due to this issue. + +The kernel has since been updated to record a layout version when creating +new arrays. Unfortunately, the kernel can not detect which layout was used +for writes to pre-existing arrays, and therefore requires input from the +administrator. This input can be provided via the +``raid0.default_layout=<N>`` kernel command line parameter, or via the +``layout`` attribute in the sysfs filesystem (but only when the array is +stopped). + +Which layout version should I use? +++++++++++++++++++++++++++++++++++ +If your RAID array has only been written to by a >= 3.14 kernel, then you +should specify version 2. If your kernel has only been written to by a < 3.14 +kernel, then you should specify version 1. If the array may have already been +written to by both kernels < 3.14 and >= 3.14, then it is possible that your +data has already suffered corruption. Note that ``mdadm --detail`` will show +you when an array was created, which may be useful in helping determine the +kernel version that was in-use at the time. + +When determining the scope of corruption, it maybe also be useful to know +that the area susceptible to this corruption is limited to the area of the +array after "MIN_DEVICE_SIZE * NUM DEVICES". diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index 1e772287b1c8e..e01cd52d71aa4 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -155,6 +155,8 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf) pr_err("md/raid0:%s: cannot assemble multi-zone RAID0 with default_layout setting\n", mdname(mddev)); pr_err("md/raid0: please set raid0.default_layout to 1 or 2\n"); + pr_err("Read the following page for more information:\n"); + pr_err("https://www.kernel.org/doc/html/latest/admin-guide/md.html#multi-zone-raid0-layout-migration\n"); err = -ENOTSUPP; goto abort; }