Re: [PATCH] md/raid0: add discard support for the 'original' layout

Song Liu <song@xxxxxxxxxx> · Mon, 26 Jun 2023 17:35:29 -0700

On Fri, Jun 23, 2023 at 11:05 AM Jason Baron <jbaron@xxxxxxxxxx> wrote:
>
> We've found that using raid0 with the 'original' layout and discard
> enabled with different disk sizes (such that at least two zones are
> created) can result in data corruption. This is due to the fact that
> the discard handling in 'raid0_handle_discard()' assumes the 'alternate'
> layout. We've seen this corruption using ext4 but other filesystems are
> likely susceptible as well.
>
> More specifically, while multiple zones are necessary to create the
> corruption, the corruption may not occur with multiple zones if they
> layout in such a way the layout matches what the 'alternate' layout
> would have produced. Thus, not all raid0 devices with the 'original'
> layout, different size disks and discard enabled will encounter this
> corruption.
>
> The 3.14 kernel inadvertently changed the raid0 disk layout for different
> size disks. Thus, running a pre-3.14 kernel and post-3.14 kernel on the
> same raid0 array could corrupt data. This lead to the creation of the
> 'original' layout (to match the pre-3.14 layout) and the 'alternate' layout
> (to match the post 3.14 layout) in the 5.4 kernel time frame and an option
> to tell the kernel which layout to use (since it couldn't be autodetected).
> However, when the 'original' layout was added back to 5.4 discard support
> for the 'original' layout was not added leading this issue.
>
> I've been able to reliably reproduce the corruption with the following
> test case:
>
> 1. create raid0 array with different size disks using original layout
> 2. mkfs
> 3. mount -o discard
> 4. create lots of files
> 5. remove 1/2 the files
> 6. fstrim -a (or just the mount point for the raid0 array)
> 7. umount
> 8. fsck -fn /dev/md0 (spews all sorts of corruptions)
>
> Let's fix this by adding proper discard support to the 'original' layout.
> The fix 'maps' the 'original' layout disks to the order in which they are
> read/written such that we can compare the disks in the same way that the
> current 'alternate' layout does. A 'disk_shift' field is added to
> 'struct strip_zone'. This could be computed on the fly in
> raid0_handle_discard() but by adding this field, we save some computation
> in the discard path.
>
> Note we could also potentially fix this by re-ordering the disks in the
> zones that follow the first one, and then always read/writing them using
> the 'alternate' layout. However, that is seen as a more substantial change,
> and we are attempting the least invasive fix at this time to remedy the
> corruption.
>
> I've verified the change using the reproducer mentioned above. Typically,
> the corruption is seen after less than 3 iterations, while the patch has
> run 500+ iterations.
>
> Cc: NeilBrown <neilb@xxxxxxx>
> Cc: Song Liu <song@xxxxxxxxxx>
> Fixes: c84a1372df92 ("md/raid0: avoid RAID0 data corruption due to layout confusion.")
> Signed-off-by: Jason Baron <jbaron@xxxxxxxxxx>

 Looks good to me! Applied to md-next.

Since this will be released with 6.6, we should have a smaller and safer fix
before that. Would you mind create a patch that ignores all discards to
orig_layout and not the first zone? We will roll that to 6.5 and back port to
stable. Then this version will be shipped to 6.6+.

Thanks,
Song