Thanks Dann and Neil! > On Oct 28, 2019, at 3:36 PM, NeilBrown <neilb@xxxxxxx> wrote: > > On Mon, Oct 28 2019, dann frazier wrote: > >> On Fri, Oct 25, 2019 at 04:07:50PM -0700, Song Liu wrote: >>> On Fri, Oct 25, 2019 at 12:16 PM dann frazier >>> <dann.frazier@xxxxxxxxxxxxx> wrote: >>>> >>>> hey, >>>> I recently hit the raid0 default_layout issue and found the warning >>>> message to be lacking (and google hits show that I'm not the >>>> only one). It wasn't clear to me that it was referring to a kernel >>>> parameter, also wasn't clear how I should decide which setting to use, >>>> and definitely not clear that picking the wrong one could *cause >>>> corruption* (which, AIUI, is the case). What are your thoughts on >>>> adding a more admin-friendly explanation/steps on resolving the >>>> problem to the admin-guide, and include a URL to it in the warning >>>> message? As prior art: >>>> >>>> https://github.com/torvalds/linux/commit/6a012288d6906fee1dbc244050ade1dafe4a9c8d >>>> >>>> If the idea has merit, let me know if you'd like me to take a stab at >>>> a strawdog. >>> >>> I think it is good to provide some more information for the admin. >>> But I am not sure whether we need to point to a url, or just put >>> everything in the message. >>> >>> Please feel free to try add this information and submit patches. >> >> Hi Song, >> Here's an RFC of what I'm thinking - but drafting this has led me to >> more questions: >> >> - It seems like we've broken backwards compatibility when creating >> new multi-zone arrays. I expected that mdadm would still let me >> create an array w/o a kernel cmdline option in-effect, and w/o >> specifying a specific layout, and just choose a default. >> But even w/ latest mdadm git, it doesn't: > > Yes. mdadm should let you do this, but it doesn't. No one has written > the code yet. > >> >> $ sudo ./mdadm --create /dev/md0 --run --metadata=default \ >> --homehost=akis --level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1 >> mdadm: /dev/vdb1 appears to be part of a raid array: >> level=raid0 devices=2 ctime=Mon Oct 28 18:55:38 2019 >> mdadm: /dev/vdc1 appears to be part of a raid array: >> level=raid0 devices=2 ctime=Mon Oct 28 18:55:38 2019 >> mdadm: RUN_ARRAY failed: Unknown error 524 >> >> It also doesn't let me specify a layout: >> $ sudo ./mdadm --create /dev/md0 --run --metadata=default \ >> --homehost=akis --level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1 --layout=2 >> mdadm: layout not meaningful for raid0 arrays. >> >> - Once you've decided on a layout to use, how do you make that a property >> of the array itself, and therefore avoid having to set default_layout >> on any machine that uses it? Seems like we'd want to "bake that in" >> to the array itself. I expected that if I created a new array on a >> recent kernel that the array would remember the layout. But after >> creating a new array on 5.4-rc5 w/ default_layout set, I rebooted >> w/o a raid0.default_layout, and the kernel still refused to start >> it. >> >> Note: I realize my observation above does not match the text below >> in the 3rd paragraph, but I just left it for now until I learn the >> "right way". mdadm should have code to set it in struct mdp_superblock_1. >> >> diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/md.rst >> index 3c51084ffd379..5364c514d7926 100644 >> --- a/Documentation/admin-guide/md.rst >> +++ b/Documentation/admin-guide/md.rst >> @@ -759,3 +759,37 @@ These currently include: >> >> ppl_write_hint >> NVMe stream ID to be set for each PPL write request. >> + >> +Multi-Zone RAID0 Layout Migration >> +---------------------- >> +An unintentional RAID0 layout change was introduced in the v3.14 kernel. >> +This effectively means there are 2 different layouts Linux will use to >> +write data to RAID0 arrays in the wild - the "pre-3.14" way and the >> +"post-3.14" way. Mixing these layouts by writing to an array while booted >> +on these different kernel versions can lead to corruption. > > Says "pre-3.14" and "post-3.14" leaves it unclear what happens with > 3.14. > Maybe "pre.3.14" and "3.14 and later" ?? > >> + >> +Note that this only impacts RAID0 arrays that include devices of different >> +sizes. If your devices are all the same size, both layouts are equivalent, >> +and your array is not at risk of corruption due to this issue. >> + >> +The kernel has since been updated to record a layout version when creating >> +new arrays. Unfortunately, the kernel can not detect which layout was used ^^^^^ "cannot" Feel free to make changes and submit this as an official patch. Thanks again, Song > > As you note in the intro, the kernel doesn't record the layout version. > The kernel doesn't change the v1.x metadata at all where possible. That > is left for mdadm to do. > >> +for writes to pre-existing arrays, and therefore requires input from the >> +administrator. This input can be provided via the >> +``raid0.default_layout=<N>`` kernel command line parameter, or via the >> +``layout`` attribute in the sysfs filesystem (but only when the array is >> +stopped). >> + >> +Which layout version should I use? >> +++++++++++++++++++++++++++++++++++ >> +If your RAID array has only been written to by a >= 3.14 kernel, then you >> +should specify version 2. If your kernel has only been written to by a < 3.14 >> +kernel, then you should specify version 1. If the array may have already been >> +written to by both kernels < 3.14 and >= 3.14, then it is possible that your >> +data has already suffered corruption. Note that ``mdadm --detail`` will show >> +you when an array was created, which may be useful in helping determine the >> +kernel version that was in-use at the time. >> + >> +When determining the scope of corruption, it maybe also be useful to know >> +that the area susceptible to this corruption is limited to the area of the >> +array after "MIN_DEVICE_SIZE * NUM DEVICES". >> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c >> index 1e772287b1c8e..e01cd52d71aa4 100644 >> --- a/drivers/md/raid0.c >> +++ b/drivers/md/raid0.c >> @@ -155,6 +155,8 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf) >> pr_err("md/raid0:%s: cannot assemble multi-zone RAID0 with default_layout setting\n", >> mdname(mddev)); >> pr_err("md/raid0: please set raid0.default_layout to 1 or 2\n"); >> + pr_err("Read the following page for more information:\n"); >> + pr_err("https://www.kernel.org/doc/html/latest/admin-guide/md.html#multi-zone-raid0-layout-migration\n"); >> err = -ENOTSUPP; >> goto abort; >> } > > I think it is an excellent idea to write this document and put the link > in the code - thanks. > > NeilBrown