Hi Jason, On Tue, Jun 20, 2023 at 2:00 PM Jason Baron <jbaron@xxxxxxxxxx> wrote: > > Hi, > > SUMMARY > ======= > > We've found that using raid0 with the 'original' layout and discard enabled > with different disk sizes (such that at least two zones are created) can > result in data corruption. This is due to the fact that the discard > handling in 'raid0_handle_discard()' assumes the 'alternate' layout. > > More specifically, while multiple zones are necessary to create the > corruption, the corruption may not occur with multiple zones if they > layout in such a way the layout matches what the 'alternate' layout > would have produced. Thus, not all raid0 devices with the 'original' > layout, different size disks and discard enabled will encounter this > corruption. > > BACKGROUND > ========== > > The 3.14 kernel inadvertleny changed the raid0 disk layout for different > size disks. Thus, running a pre-3.14 kernel and post-3.14 kernel on the > same raid0 array could corrupt data. This lead to the creation of the > 'original' layout (to match the pre-3.14 layout) and the 'alternate' layout > (to match the post 3.14 layout) in the 5.4 kernel time frame and an option > to tell the kernel which layout to use (since it couldn't be autodetected) > [1]. The mdadm tool also added support for writing the layout to the raid0 > superblock which the kernel honors. The mdadm tool defaults to the original > layout unless --layout is given (--layout=alternate or --layout=original). > > REPRODUCER (pseudocode) > ======================= > > The below steps were the simplest we found to consistently reproduced the > corruption: > > 1. create raid0 array with different size disks using original layout > 2. mkfs > 3. mount -o discard > 4. create lots of files > 5. remove 1/2 the files > 6. fstrim -a (or just the mount point for the raid0 array) > 7. umount > 8. fsck -fn /dev/md1 > > Here is an example corruption, though we've seen many others: > > fsck from util-linux 2.34 > e2fsck 1.45.5 (07-Jan-2020) > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Directory inode 24557, block #0, offset 0: directory has no checksum. > Fix? no > > Directory inode 24557, block #0, offset 0: directory corrupted > Salvage? no > > e2fsck: aborted > > /dev/md1: ********** WARNING: Filesystem still has errors ********** > > > VULNERABLE VERSIONS > =================== > > We've confirmed the corruption with the latest 6.4-rc7 kernel. Kernels that > support the 'original' layout could have this issue. The 'original' layout > was added to the 5.4+ kernels, although the 'original' layout may have been > backported to earlier kernels. > > Kernels versions from 3.14 to 5.4 that only have support for the 'alternate' > layout should be ok. I'm not sure about pre-3.14 kernels that only have the > 'original' layout. One would have to go back and see if they support > discard and if that support is correct. > > WORKAROUNDS > =========== > > Until a proper code fix is implemented in the kernel and updated, valid > workarounds we have found are any 1 of the following: > > 1. disable discard and fstrim from running on the raid0 array > 2. create new arrays only with the 'alternate' layout, the 'mdadm' > command understands mdadm --layout=alternate > 3. ensure there is only 1 raid0 zone (if disks are all the same size) > > That said, you may already have 'silent corruption' if data is not covered > by a checksum. > > FIXING > ======= > > It may make sense to have the kernel just disable discard for raid0 with > the 'original' layout, and update mdadm to default to 'alternate' and treat > the 'original' layout more as a deprecated format? I'm curious as to why > mdadm is using 'original' layout as the default for new arrays? > > Alternatively, we could patch the 'original' layout and add proper discard > support. Thanks for the detailed report and analysis! We will look into fixing it. Song > > THANKS > ====== > > Thanks to Michael Zhivich, Debabrata Banerjee and Jeff Dike for helping uncover this issue. > > Thanks, > > -Jason > > [1] https://wiki.ubuntu.com/Kernel/Raid0LayoutMigration