On Fri, May 24, 2019 at 1:11 AM Thorsten Knabe <linux@xxxxxxxxxxxxxxxxx> wrote: > > On 5/22/19 8:24 PM, Song Liu wrote: > > Hi Thorsten, > > > > On Wed, May 22, 2019 at 9:19 AM Song Liu <liu.song.a23@xxxxxxxxx> wrote: > >> > >> Hi Thorsten, > >> > >> Thanks for the report. I will follow up with stable@ to fix them. > >> > >> Best regards, > >> Song > > > > Could you please confirm the follow patches fixes the issue? > > > > commit a25d8c327bb4 ("Revert "Don't jump to compute_result state from > > check_result state"") > > commit b2176a1dfb51 ("md/raid: raid5 preserve the writeback action > > after the parity check") > > Hello Song. > > With the two patches applied to Linux 5.1.4 I was not able to reproduce > the previously observed file system and data corruptions by replacing a > disk of a RAID6 array. > > Thorsten Thanks for testing the fix! Song > > > > > Thanks, > > Song > > > > > >> > >> On Wed, May 22, 2019 at 5:26 AM Thorsten Knabe <linux@xxxxxxxxxxxxxxxxx> wrote: > >>> > >>> Hello. > >>> > >>> BUG: RAID6 recovery broken by commit > >>> 4f4fd7c5798bbdd5a03a60f6269cf1177fbd11ef (Linux 5.1.3+) > >>> > >>> Replacing a failed disk of a MD RAID6 array causes file system > >>> corruption and data loss on kernels containing commit > >>> 4f4fd7c5798bbdd5a03a60f6269cf1177fbd11ef. > >>> > >>> Affected kernels: 5.1.3, 5.1.4 possibly others. > >>> Unaffected kernels: 5.1.2 > >>> > >>> OS: Debian stretch amd64 > >>> > >>> Steps to reproduce the BUG: > >>> > >>> 1. Create a new 4-disk RAID6 array, create a file system and mount it: > >>> mdadm /dev/md0 --create -l 6 -n 4 /dev/sd[bcde] > >>> mkfs.ext4 /dev/md0 > >>> mount /dev/md0 /mnt > >>> 2. Store some data (a few GB should be fine) on the RAID6 arrays file > >>> system: > >>> cp -r whatever /mnt > >>> 3. Fail a disk of the RAID6 array and remove it from the array: > >>> mdadm /dev/md0 --fail /dev/sdd > >>> mdadm /dev/md0 --remove /dev/sdd > >>> 4. Drop caches: > >>> echo "3" > /proc/sys/vm/drop_caches > >>> 5. Compare data copied to the RAID6 array in step 2 with its source: > >>> diff -r whatever /mnt/whatever > >>> There should be no differences and no file system errors. > >>> 6. Add a new empty disk to the RAID6 array: > >>> mdadm /dev/md0 --add /dev/sdf > >>> 7. RAID6 recovery should start now, wait for the RAID6 recovery to finish. > >>> 8. Drop caches again: > >>> echo "3" > /proc/sys/vm/drop_caches > >>> 9. Compare data copied to the RAID6 array in step 2 with its source again: > >>> diff -r whatever /mnt/whatever > >>> diff now reports a lot of differences and the kernel log gets filled > >>> with file system errors. For example: > >>> EXT4-fs warning (device md0): ext4_dirent_csum_verify:355: inode > >>> #918549: comm diff: No space for directory leaf checksum. Please run > >>> e2fsck -D. > >>> > >>> Reverting commit 4f4fd7c5798bbdd5a03a60f6269cf1177fbd11ef from kernel > >>> 5.1.4 resolves the issues described above. > >>> > >>> Kind regards > >>> Thorsten > >>> > >>> > >>> -- > >>> ___ > >>> | | / E-Mail: linux@xxxxxxxxxxxxxxxxx > >>> |horsten |/\nabe WWW: http://linux.thorsten-knabe.de > >>> > > > -- > ___ > | | / E-Mail: linux@xxxxxxxxxxxxxxxxx > |horsten |/\nabe WWW: http://linux.thorsten-knabe.de