Re: [REGRESSION] Data read from a degraded RAID 4/5/6 array could be silently corrupted.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+ more folks.

On Fri, Nov 10, 2023 at 7:00 PM Bhanu Victor DiCara
<00bvd0+linux@xxxxxxxxx> wrote:
>
> A degraded RAID 4/5/6 array can sometimes read 0s instead of the actual data.
>
>
> #regzbot introduced: 10764815ff4728d2c57da677cd5d3dd6f446cf5f
> (The problem does not occur in the previous commit.)
>
> In commit 10764815ff4728d2c57da677cd5d3dd6f446cf5f, file drivers/md/raid5.c, line 5808, there is `md_account_bio(mddev, &bi);`. When this line (and the previous line) is removed, the problem does not occur.

The patch below should fix it. Please give it more thorough tests and
let me know whether it fixes everything. I will send patch later with
more details.

Thanks,
Song

diff --git i/drivers/md/md.c w/drivers/md/md.c
index 68f3bb6e89cb..d4fb1aa5c86f 100644
--- i/drivers/md/md.c
+++ w/drivers/md/md.c
@@ -8674,7 +8674,8 @@ static void md_end_clone_io(struct bio *bio)
        struct bio *orig_bio = md_io_clone->orig_bio;
        struct mddev *mddev = md_io_clone->mddev;

-       orig_bio->bi_status = bio->bi_status;
+       if (bio->bi_status)
+               orig_bio->bi_status = bio->bi_status;

        if (md_io_clone->start_time)
                bio_end_io_acct(orig_bio, md_io_clone->start_time);


>
> Similarly, in commit ffc253263a1375a65fa6c9f62a893e9767fbebfa (v6.6), file drivers/md/raid5.c, when line 6200 is removed, the problem does not occur.
>
>
> Steps to reproduce the problem (using bash or similar):
> 1. Create a degraded RAID 4/5/6 array:
> fallocate -l 2056M test_array_part_1.img
> fallocate -l 2056M test_array_part_2.img
> lo1=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_array_part_1.img)
> lo2=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_array_part_2.img)
> # The RAID level must be 4 or 5 or 6 with at least 1 missing drive in any order. The following configuration seems to be the most effective:
> mdadm --create /dev/md/tmp_test_array --level=4 --raid-devices=3 --chunk=1M --size=2G  $lo1 missing $lo2
>
> 2. Create the test file system and clone it to the degraded array:
> fallocate -l 4G test_fs.img
> mke2fs -t ext4 -b 4096 -i 65536 -m 0 -E stride=256,stripe_width=512 -L test_fs  test_fs.img
> lo3=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_fs.img)
> mount $lo3 /mnt/1
> python3 create_test_fs.py /mnt/1
> umount /mnt/1
> cat test_fs.img > /dev/md/tmp_test_array
> cmp -l test_fs.img /dev/md/tmp_test_array  # Optionally verify the clone
> mount --read-only $lo3 /mnt/1
>
> 3. Mount the degraded array:
> mount --read-only /dev/md/tmp_test_array /mnt/2
>
> 4. Compare the files:
> diff -q /mnt/1 /mnt/2
>
> If no files are detected as different, do `umount /mnt/2` and `echo 2 > /proc/sys/vm/drop_caches`, and then go to step 3.
> (Doing `echo 3 > /proc/sys/vm/drop_caches` and then going to step 4 is less effective.)
> (Only doing `umount /mnt/2` and/or `echo 1 > /proc/sys/vm/drop_caches` is much less effective and the effectiveness wears off.)
>
>
> create_test_fs.py:
> import errno
> import itertools
> import os
> import random
> import sys
>
>
> def main(test_fs_path):
>         rng = random.Random(0)
>         try:
>                 for i in itertools.count():
>                         size = int(2**rng.uniform(12, 24))
>                         with open(os.path.join(test_fs_path, str(i).zfill(4) + '.bin'), 'xb') as f:
>                                 f.write(b'\xff' * size)
>                         print(f'Created file {f.name!r} with size {size}')
>         except OSError as e:
>                 if e.errno != errno.ENOSPC:
>                         raise
>                 print(f'Done: {e.strerror} (partially created file {f.name!r})')
>
>
> if __name__ == '__main__':
>         main(sys.argv[1])
>
>
>




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux