Re: admin-guide page for raid0 layout issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Dann and Neil!

> On Oct 28, 2019, at 3:36 PM, NeilBrown <neilb@xxxxxxx> wrote:
> 
> On Mon, Oct 28 2019, dann frazier wrote:
> 
>> On Fri, Oct 25, 2019 at 04:07:50PM -0700, Song Liu wrote:
>>> On Fri, Oct 25, 2019 at 12:16 PM dann frazier
>>> <dann.frazier@xxxxxxxxxxxxx> wrote:
>>>> 
>>>> hey,
>>>>  I recently hit the raid0 default_layout issue and found the warning
>>>> message to be lacking (and google hits show that I'm not the
>>>> only one). It wasn't clear to me that it was referring to a kernel
>>>> parameter, also wasn't clear how I should decide which setting to use,
>>>> and definitely not clear that picking the wrong one could *cause
>>>> corruption* (which, AIUI, is the case). What are your thoughts on
>>>> adding a more admin-friendly explanation/steps on resolving the
>>>> problem to the admin-guide, and include a URL to it in the warning
>>>> message? As prior art:
>>>> 
>>>>  https://github.com/torvalds/linux/commit/6a012288d6906fee1dbc244050ade1dafe4a9c8d
>>>> 
>>>> If the idea has merit, let me know if you'd like me to take a stab at
>>>> a strawdog.
>>> 
>>> I think it is good to provide some more information for the admin.
>>> But I am not sure whether we need to point to a url, or just put
>>> everything in the message.
>>> 
>>> Please feel free to try add this information and submit patches.
>> 
>> Hi Song,
>>  Here's an RFC of what I'm thinking - but drafting this has led me to
>> more questions:
>> 
>> - It seems like we've broken backwards compatibility when creating
>>   new multi-zone arrays. I expected that mdadm would still let me
>>   create an array w/o a kernel cmdline option in-effect, and w/o
>>   specifying a specific layout, and just choose a default.
>>   But even w/ latest mdadm git, it doesn't:
> 
> Yes.  mdadm should let you do this, but it doesn't.  No one has written
> the code yet.
> 
>> 
>>    $ sudo ./mdadm --create /dev/md0 --run --metadata=default \
>>      --homehost=akis --level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1
>>    mdadm: /dev/vdb1 appears to be part of a raid array:
>>           level=raid0 devices=2 ctime=Mon Oct 28 18:55:38 2019
>>    mdadm: /dev/vdc1 appears to be part of a raid array:
>>           level=raid0 devices=2 ctime=Mon Oct 28 18:55:38 2019
>>    mdadm: RUN_ARRAY failed: Unknown error 524
>> 
>>  It also doesn't let me specify a layout:
>>    $ sudo ./mdadm --create /dev/md0 --run --metadata=default \
>>      --homehost=akis --level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1 --layout=2
>>    mdadm: layout not meaningful for raid0 arrays.
>> 
>> - Once you've decided on a layout to use, how do you make that a property
>>   of the array itself, and therefore avoid having to set default_layout 
>>   on any machine that uses it? Seems like we'd want to "bake that in"
>>   to the array itself. I expected that if I created a new array on a
>>   recent kernel that the array would remember the layout. But after
>>   creating a new array on 5.4-rc5 w/ default_layout set, I rebooted
>>   w/o a raid0.default_layout, and the kernel still refused to start
>>   it.
>> 
>>   Note: I realize my observation above does not match the text below
>>   in the 3rd paragraph, but I just left it for now until I learn the
>>   "right way".

mdadm should have code to set it in struct mdp_superblock_1. 

>> 
>> diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/md.rst
>> index 3c51084ffd379..5364c514d7926 100644
>> --- a/Documentation/admin-guide/md.rst
>> +++ b/Documentation/admin-guide/md.rst
>> @@ -759,3 +759,37 @@ These currently include:
>> 
>>   ppl_write_hint
>>       NVMe stream ID to be set for each PPL write request.
>> +
>> +Multi-Zone RAID0 Layout Migration
>> +----------------------
>> +An unintentional RAID0 layout change was introduced in the v3.14 kernel.
>> +This effectively means there are 2 different layouts Linux will use to
>> +write data to RAID0 arrays in the wild - the "pre-3.14" way and the
>> +"post-3.14" way. Mixing these layouts by writing to an array while booted
>> +on these different kernel versions can lead to corruption.
> 
> Says "pre-3.14" and "post-3.14" leaves it unclear what happens with
> 3.14.
> Maybe "pre.3.14" and "3.14 and later" ??
> 
>> +
>> +Note that this only impacts RAID0 arrays that include devices of different
>> +sizes. If your devices are all the same size, both layouts are equivalent,
>> +and your array is not at risk of corruption due to this issue.
>> +
>> +The kernel has since been updated to record a layout version when creating
>> +new arrays. Unfortunately, the kernel can not detect which layout was used
                                          ^^^^^ "cannot"

Feel free to make changes and submit this as an official patch. 

Thanks again,
Song

> 
> As you note in the intro, the kernel doesn't record the layout version.
> The kernel doesn't change the v1.x metadata at all where possible. That
> is left for mdadm to do.
> 
>> +for writes to pre-existing arrays, and therefore requires input from the
>> +administrator. This input can be provided via the
>> +``raid0.default_layout=<N>`` kernel command line parameter, or via the
>> +``layout`` attribute in the sysfs filesystem (but only when the array is
>> +stopped).
>> +
>> +Which layout version should I use?
>> +++++++++++++++++++++++++++++++++++
>> +If your RAID array has only been written to by a >= 3.14 kernel, then you
>> +should specify version 2. If your kernel has only been written to by a < 3.14
>> +kernel, then you should specify version 1. If the array may have already been
>> +written to by both kernels < 3.14 and >= 3.14, then it is possible that your
>> +data has already suffered corruption. Note that ``mdadm --detail`` will show
>> +you when an array was created, which may be useful in helping determine the
>> +kernel version that was in-use at the time.
>> +
>> +When determining the scope of corruption, it maybe also be useful to know
>> +that the area susceptible to this corruption is limited to the area of the
>> +array after "MIN_DEVICE_SIZE * NUM DEVICES".
>> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
>> index 1e772287b1c8e..e01cd52d71aa4 100644
>> --- a/drivers/md/raid0.c
>> +++ b/drivers/md/raid0.c
>> @@ -155,6 +155,8 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
>> 		pr_err("md/raid0:%s: cannot assemble multi-zone RAID0 with default_layout setting\n",
>> 		       mdname(mddev));
>> 		pr_err("md/raid0: please set raid0.default_layout to 1 or 2\n");
>> +		pr_err("Read the following page for more information:\n");
>> +		pr_err("https://www.kernel.org/doc/html/latest/admin-guide/md.html#multi-zone-raid0-layout-migration\n";);
>> 		err = -ENOTSUPP;
>> 		goto abort;
>> 	}
> 
> I think it is an excellent idea to write this document and put the link
> in the code - thanks.
> 
> NeilBrown





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux