On 5/26/23 14:45, Xiao Ni wrote:
On Fri, May 26, 2023 at 11:09 AM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:
On 5/26/23 09:49, Xiao Ni wrote:
Hi all
We found a problem recently. The read data is wrong when recovery happens.
Now we've found it's introduced by patch 10764815f (md: add io accounting
for raid0 and raid5). I can reproduce this 100%. This problem exists in
upstream. The test steps are like this:
1. mdadm -CR $devname -l5 -n4 /dev/sd[b-e] --force --assume-clean
2. mkfs.ext4 -F $devname
3. mount $devname $mount_point
4. mdadm --incremental --fail sdd
5. dd if=/dev/zero of=/tmp/pythontest/file1 bs=1M count=100000 status=progress
I suppose /tmp is the mount point.
6. mdadm /dev/md126 --add /dev/sdd
7. create 31 processes that writes and reads. It compares the content with
md5sum. The test will go on until the recovery stops
Could you share the test code/script for step 7? Will try it from my side.
8. wait for about 10 minutes, we can see some processes report checksum is
wrong. But if it re-read the data again, the checksum will be good.
So it is interim, I guess it appeared before recover was finished.
I tried to narrow this problem like this:
- md_account_bio(mddev, &bi);
+ if (rw == WRITE)
+ md_account_bio(mddev, &bi);
If it only do account for write requests, the problem can disappear.
- if (rw == READ && mddev->degraded == 0 &&
- mddev->reshape_position == MaxSector) {
- bi = chunk_aligned_read(mddev, bi);
- if (!bi)
- return true;
- }
+ //if (rw == READ && mddev->degraded == 0 &&
+ // mddev->reshape_position == MaxSector) {
+ // bi = chunk_aligned_read(mddev, bi);
+ // if (!bi)
+ // return true;
+ //}
if (unlikely(bio_op(bi) == REQ_OP_DISCARD)) {
make_discard_request(mddev, bi);
@@ -6180,7 +6180,8 @@ static bool raid5_make_request(struct mddev *mddev,
struct bio * bi)
md_write_end(mddev);
return true;
}
- md_account_bio(mddev, &bi);
+ if (rw == READ)
+ md_account_bio(mddev, &bi);
I comment the chunk_aligned_read out and only account for read requests,
this problem can be reproduced.
After a quick look,raid5_read_one_chunk clones bio by itself, so no need to
do it for the chunk aligned readcase. Could you pls try this?
[...]
Hi Guoqing
When chunk_aligned_read runs successfully, it just returns. In this
case, it does the account by itself. If it fails to execute, it still
needs run md_account_bio in raid5_make_request. So now the logic
should be right.
Hmm, I was confused.
Thanks,
Guoqing