Hi Jason, Thanks for the fix! On Wed, Jun 28, 2023 at 7:32 AM Jason Baron <jbaron@xxxxxxxxxx> wrote: > > We've found that using raid0 with the 'original' layout and discard > enabled with different disk sizes (such that at least two zones are > created) can result in data corruption. This is due to the fact that > the discard handling in 'raid0_handle_discard()' assumes the 'alternate' > layout. We've seen this corruption using ext4 but other filesystems are > likely susceptible as well. > > More specifically, while multiple zones are necessary to create the > corruption, the corruption may not occur with multiple zones if they > layout in such a way the layout matches what the 'alternate' layout > would have produced. Thus, not all raid0 devices with the 'original' > layout, different size disks and discard enabled will encounter this > corruption. > > The 3.14 kernel inadvertently changed the raid0 disk layout for different > size disks. Thus, running a pre-3.14 kernel and post-3.14 kernel on the > same raid0 array could corrupt data. This lead to the creation of the > 'original' layout (to match the pre-3.14 layout) and the 'alternate' layout > (to match the post 3.14 layout) in the 5.4 kernel time frame and an option > to tell the kernel which layout to use (since it couldn't be autodetected). > However, when the 'original' layout was added back to 5.4 discard support > for the 'original' layout was not added leading this issue. > > I've been able to reliably reproduce the corruption with the following > test case: > > 1. create raid0 array with different size disks using original layout > 2. mkfs > 3. mount -o discard > 4. create lots of files > 5. remove 1/2 the files > 6. fstrim -a (or just the mount point for the raid0 array) > 7. umount > 8. fsck -fn /dev/md0 (spews all sorts of corruptions) > > The fix here is a minimal fix intended for stable trees which doesn't do > discard if we have the original layout and we are not in first zone or > if the i/o crosses zones (we either do the entire discard or none of it). > The proper fix to actually perform discard to all zones for the original > layout will land in upstream versions. We have implemented the minimal > fix here for stable branches to reduce risk. > > I've verified the change using the reproducer mentioned above. Typically, > the corruption is seen after less than 3 iterations, while the patch has > run 500+ iterations. > > Cc: NeilBrown <neilb@xxxxxxx> > Cc: Song Liu <song@xxxxxxxxxx> > Fixes: c84a1372df92 ("md/raid0: avoid RAID0 data corruption due to layout confusion.") > Signed-off-by: Jason Baron <jbaron@xxxxxxxxxx> > --- > drivers/md/raid0.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c > index f8ee9a95e25d..2713a4acb44f 100644 > --- a/drivers/md/raid0.c > +++ b/drivers/md/raid0.c > @@ -444,10 +444,25 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio) > sector_t end_disk_offset; > unsigned int end_disk_index; > unsigned int disk; > + bool bio_zone_overlap = false; > > zone = find_zone(conf, &start); > + if (bio_end_sector(bio) > zone->zone_end) > + bio_zone_overlap = true; > + > + /* The discard code below doesn't properly support the original > + * layout for the zones above the first one. We are adding > + * proper support in later kernel versions but have decided > + * that dropping discard support here is the lower risk option > + * to avoid data corruption for stable versions. > + */ > + if ((conf->layout == RAID0_ORIG_LAYOUT) && > + ((zone != conf->strip_zone) || (bio_zone_overlap))) { > + bio_endio(bio); > + return; > + } For bio_zone_overlap case, I think we can still do the split below and send discard to the first zone? Song > > - if (bio_end_sector(bio) > zone->zone_end) { > + if (bio_zone_overlap) { > struct bio *split = bio_split(bio, > zone->zone_end - bio->bi_iter.bi_sector, GFP_NOIO, > &mddev->bio_set); > -- > 2.25.1 >