On 5/22/19 8:24 PM, Song Liu wrote: > Hi Thorsten, > > On Wed, May 22, 2019 at 9:19 AM Song Liu <liu.song.a23@xxxxxxxxx> wrote: >> >> Hi Thorsten, >> >> Thanks for the report. I will follow up with stable@ to fix them. >> >> Best regards, >> Song > > Could you please confirm the follow patches fixes the issue? > > commit a25d8c327bb4 ("Revert "Don't jump to compute_result state from > check_result state"") > commit b2176a1dfb51 ("md/raid: raid5 preserve the writeback action > after the parity check") Hello Song. With the two patches applied to Linux 5.1.4 I was not able to reproduce the previously observed file system and data corruptions by replacing a disk of a RAID6 array. Thorsten > > Thanks, > Song > > >> >> On Wed, May 22, 2019 at 5:26 AM Thorsten Knabe <linux@xxxxxxxxxxxxxxxxx> wrote: >>> >>> Hello. >>> >>> BUG: RAID6 recovery broken by commit >>> 4f4fd7c5798bbdd5a03a60f6269cf1177fbd11ef (Linux 5.1.3+) >>> >>> Replacing a failed disk of a MD RAID6 array causes file system >>> corruption and data loss on kernels containing commit >>> 4f4fd7c5798bbdd5a03a60f6269cf1177fbd11ef. >>> >>> Affected kernels: 5.1.3, 5.1.4 possibly others. >>> Unaffected kernels: 5.1.2 >>> >>> OS: Debian stretch amd64 >>> >>> Steps to reproduce the BUG: >>> >>> 1. Create a new 4-disk RAID6 array, create a file system and mount it: >>> mdadm /dev/md0 --create -l 6 -n 4 /dev/sd[bcde] >>> mkfs.ext4 /dev/md0 >>> mount /dev/md0 /mnt >>> 2. Store some data (a few GB should be fine) on the RAID6 arrays file >>> system: >>> cp -r whatever /mnt >>> 3. Fail a disk of the RAID6 array and remove it from the array: >>> mdadm /dev/md0 --fail /dev/sdd >>> mdadm /dev/md0 --remove /dev/sdd >>> 4. Drop caches: >>> echo "3" > /proc/sys/vm/drop_caches >>> 5. Compare data copied to the RAID6 array in step 2 with its source: >>> diff -r whatever /mnt/whatever >>> There should be no differences and no file system errors. >>> 6. Add a new empty disk to the RAID6 array: >>> mdadm /dev/md0 --add /dev/sdf >>> 7. RAID6 recovery should start now, wait for the RAID6 recovery to finish. >>> 8. Drop caches again: >>> echo "3" > /proc/sys/vm/drop_caches >>> 9. Compare data copied to the RAID6 array in step 2 with its source again: >>> diff -r whatever /mnt/whatever >>> diff now reports a lot of differences and the kernel log gets filled >>> with file system errors. For example: >>> EXT4-fs warning (device md0): ext4_dirent_csum_verify:355: inode >>> #918549: comm diff: No space for directory leaf checksum. Please run >>> e2fsck -D. >>> >>> Reverting commit 4f4fd7c5798bbdd5a03a60f6269cf1177fbd11ef from kernel >>> 5.1.4 resolves the issues described above. >>> >>> Kind regards >>> Thorsten >>> >>> >>> -- >>> ___ >>> | | / E-Mail: linux@xxxxxxxxxxxxxxxxx >>> |horsten |/\nabe WWW: http://linux.thorsten-knabe.de >>> -- ___ | | / E-Mail: linux@xxxxxxxxxxxxxxxxx |horsten |/\nabe WWW: http://linux.thorsten-knabe.de