[REGRESSION] Data read from a degraded RAID 4/5/6 array could be silently corrupted.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



A degraded RAID 4/5/6 array can sometimes read 0s instead of the actual data.


#regzbot introduced: 10764815ff4728d2c57da677cd5d3dd6f446cf5f
(The problem does not occur in the previous commit.)

In commit 10764815ff4728d2c57da677cd5d3dd6f446cf5f, file drivers/md/raid5.c, line 5808, there is `md_account_bio(mddev, &bi);`. When this line (and the previous line) is removed, the problem does not occur.

Similarly, in commit ffc253263a1375a65fa6c9f62a893e9767fbebfa (v6.6), file drivers/md/raid5.c, when line 6200 is removed, the problem does not occur.


Steps to reproduce the problem (using bash or similar):
1. Create a degraded RAID 4/5/6 array:
fallocate -l 2056M test_array_part_1.img
fallocate -l 2056M test_array_part_2.img
lo1=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_array_part_1.img)
lo2=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_array_part_2.img)
# The RAID level must be 4 or 5 or 6 with at least 1 missing drive in any order. The following configuration seems to be the most effective:
mdadm --create /dev/md/tmp_test_array --level=4 --raid-devices=3 --chunk=1M --size=2G  $lo1 missing $lo2

2. Create the test file system and clone it to the degraded array:
fallocate -l 4G test_fs.img
mke2fs -t ext4 -b 4096 -i 65536 -m 0 -E stride=256,stripe_width=512 -L test_fs  test_fs.img
lo3=$(losetup --sector-size 4096 --find --nooverlap --direct-io --show  test_fs.img)
mount $lo3 /mnt/1
python3 create_test_fs.py /mnt/1
umount /mnt/1
cat test_fs.img > /dev/md/tmp_test_array
cmp -l test_fs.img /dev/md/tmp_test_array  # Optionally verify the clone
mount --read-only $lo3 /mnt/1

3. Mount the degraded array:
mount --read-only /dev/md/tmp_test_array /mnt/2

4. Compare the files:
diff -q /mnt/1 /mnt/2

If no files are detected as different, do `umount /mnt/2` and `echo 2 > /proc/sys/vm/drop_caches`, and then go to step 3.
(Doing `echo 3 > /proc/sys/vm/drop_caches` and then going to step 4 is less effective.)
(Only doing `umount /mnt/2` and/or `echo 1 > /proc/sys/vm/drop_caches` is much less effective and the effectiveness wears off.)


create_test_fs.py:
import errno
import itertools
import os
import random
import sys


def main(test_fs_path):
	rng = random.Random(0)
	try:
		for i in itertools.count():
			size = int(2**rng.uniform(12, 24))
			with open(os.path.join(test_fs_path, str(i).zfill(4) + '.bin'), 'xb') as f:
				f.write(b'\xff' * size)
			print(f'Created file {f.name!r} with size {size}')
	except OSError as e:
		if e.errno != errno.ENOSPC:
			raise
		print(f'Done: {e.strerror} (partially created file {f.name!r})')


if __name__ == '__main__':
	main(sys.argv[1])






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux