Re: [PATCH] md/raid0: drop discard support past the first zone for the original layout (stable fix)

Song Liu <song@xxxxxxxxxx> · Wed, 28 Jun 2023 15:43:00 -0700

Hi Jason,

Thanks for the fix!

On Wed, Jun 28, 2023 at 7:32 AM Jason Baron <jbaron@xxxxxxxxxx> wrote:
>
> We've found that using raid0 with the 'original' layout and discard
> enabled with different disk sizes (such that at least two zones are
> created) can result in data corruption. This is due to the fact that
> the discard handling in 'raid0_handle_discard()' assumes the 'alternate'
> layout. We've seen this corruption using ext4 but other filesystems are
> likely susceptible as well.
>
> More specifically, while multiple zones are necessary to create the
> corruption, the corruption may not occur with multiple zones if they
> layout in such a way the layout matches what the 'alternate' layout
> would have produced. Thus, not all raid0 devices with the 'original'
> layout, different size disks and discard enabled will encounter this
> corruption.
>
> The 3.14 kernel inadvertently changed the raid0 disk layout for different
> size disks. Thus, running a pre-3.14 kernel and post-3.14 kernel on the
> same raid0 array could corrupt data. This lead to the creation of the
> 'original' layout (to match the pre-3.14 layout) and the 'alternate' layout
> (to match the post 3.14 layout) in the 5.4 kernel time frame and an option
> to tell the kernel which layout to use (since it couldn't be autodetected).
> However, when the 'original' layout was added back to 5.4 discard support
> for the 'original' layout was not added leading this issue.
>
> I've been able to reliably reproduce the corruption with the following
> test case:
>
> 1. create raid0 array with different size disks using original layout
> 2. mkfs
> 3. mount -o discard
> 4. create lots of files
> 5. remove 1/2 the files
> 6. fstrim -a (or just the mount point for the raid0 array)
> 7. umount
> 8. fsck -fn /dev/md0 (spews all sorts of corruptions)
>
> The fix here is a minimal fix intended for stable trees which doesn't do
> discard if we have the original layout and we are not in first zone or
> if the i/o crosses zones (we either do the entire discard or none of it).
> The proper fix to actually perform discard to all zones for the original
> layout will land in upstream versions. We have implemented the minimal
> fix here for stable branches to reduce risk.
>
> I've verified the change using the reproducer mentioned above. Typically,
> the corruption is seen after less than 3 iterations, while the patch has
> run 500+ iterations.
>
> Cc: NeilBrown <neilb@xxxxxxx>
> Cc: Song Liu <song@xxxxxxxxxx>
> Fixes: c84a1372df92 ("md/raid0: avoid RAID0 data corruption due to layout confusion.")
> Signed-off-by: Jason Baron <jbaron@xxxxxxxxxx>
> ---
>  drivers/md/raid0.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index f8ee9a95e25d..2713a4acb44f 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -444,10 +444,25 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
>         sector_t end_disk_offset;
>         unsigned int end_disk_index;
>         unsigned int disk;
> +       bool bio_zone_overlap = false;
>
>         zone = find_zone(conf, &start);
> +       if (bio_end_sector(bio) > zone->zone_end)
> +               bio_zone_overlap = true;
> +
> +       /* The discard code below doesn't properly support the original
> +        * layout for the zones above the first one. We are adding
> +        * proper support in later kernel versions but have decided
> +        * that dropping discard support here is the lower risk option
> +        * to avoid data corruption for stable versions.
> +        */
> +       if ((conf->layout == RAID0_ORIG_LAYOUT) &&
> +           ((zone != conf->strip_zone) || (bio_zone_overlap))) {
> +               bio_endio(bio);
> +               return;
> +       }

For bio_zone_overlap case, I think we can still do the split below and
send discard to the first zone?

Song

>
> -       if (bio_end_sector(bio) > zone->zone_end) {
> +       if (bio_zone_overlap) {
>                 struct bio *split = bio_split(bio,
>                         zone->zone_end - bio->bi_iter.bi_sector, GFP_NOIO,
>                         &mddev->bio_set);
> --
> 2.25.1
>