On Mon, May 29, 2023 at 4:34 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote: > > > > On 5/29/23 11:41, Xiao Ni wrote: > > On Mon, May 29, 2023 at 10:27 AM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote: > >> > >> > >> On 5/26/23 15:23, Xiao Ni wrote: > >>>>>>> 6. mdadm /dev/md126 --add /dev/sdd > >>>>>>> 7. create 31 processes that writes and reads. It compares the content with > >>>>>>> md5sum. The test will go on until the recovery stops > >>>> Could you share the test code/script for step 7? Will try it from my side. > >>> The test scripts are written by people from intel. > >>> Hi, Mariusz. Can I share the test scripts here? > >>> > >>>>>>> 8. wait for about 10 minutes, we can see some processes report checksum is > >>>>>>> wrong. But if it re-read the data again, the checksum will be good. > >>>> So it is interim, I guess it appeared before recover was finished. > >>> Yes, it appears before recovery finishes. The test will finish once > >>> the recovery finishes. > >>> > >>>>>>> I tried to narrow this problem like this: > >>>>>>> > >>>>>>> - md_account_bio(mddev, &bi); > >>>>>>> + if (rw == WRITE) > >>>>>>> + md_account_bio(mddev, &bi); > >>>>>>> If it only do account for write requests, the problem can disappear. > >>>>>>> > >>>>>>> - if (rw == READ && mddev->degraded == 0 && > >>>>>>> - mddev->reshape_position == MaxSector) { > >>>>>>> - bi = chunk_aligned_read(mddev, bi); > >>>>>>> - if (!bi) > >>>>>>> - return true; > >>>>>>> - } > >>>>>>> + //if (rw == READ && mddev->degraded == 0 && > >>>>>>> + // mddev->reshape_position == MaxSector) { > >>>>>>> + // bi = chunk_aligned_read(mddev, bi); > >>>>>>> + // if (!bi) > >>>>>>> + // return true; > >>>>>>> + //} > >>>>>>> > >>>>>>> if (unlikely(bio_op(bi) == REQ_OP_DISCARD)) { > >>>>>>> make_discard_request(mddev, bi); > >>>>>>> @@ -6180,7 +6180,8 @@ static bool raid5_make_request(struct mddev *mddev, > >>>>>>> struct bio * bi) > >>>>>>> md_write_end(mddev); > >>>>>>> return true; > >>>>>>> } > >>>>>>> - md_account_bio(mddev, &bi); > >>>>>>> + if (rw == READ) > >>>>>>> + md_account_bio(mddev, &bi); > >>>>>>> > >>>>>>> I comment the chunk_aligned_read out and only account for read requests, > >>>>>>> this problem can be reproduced. > >> Only write bio and non aligned chunk read bio call md_account_bio, and > >> only account write bio is fine per your test. It means the md5sum didn't match > >> because of non aligned chunk read bio, so it is not abnormal that data in another chunk could > >> be changed with the recovery is not finished, right? > > That's right, only non aligned read requests can cause this problem. > > Good catch. If I understand right, you mean the non aligned read > > request reads data from the chunk which hasn't been recovered, right? > > Yes, I don't think compare md5sum for such scenario makes more sense given > the state is interim. And it also appeared in my test with disable io > accounting. It's the data which customers read that can be wrong. So it's a dangerous thing. Because customers use the data directly. If it's true it looks like there is a race between non aligned read and recovery process. Regards Xiao > > >> BTW, I had run the test with bio accounting disabled by default, and > >> seems the result is > >> same. > >> > >>> git tag --sort=taggerdate --contain 10764815f |head -1 > >> v5.14-rc1 > >> > >> localhost:~/readdata #uname -r > >> 5.15.0-rc4-59.24-default > >> localhost:~/readdata #cat /sys/block/md126/queue/iostats > >> 0 > >> > >> And I can still see relevant log from the terminal which runs 01-test.sh > > Hmm, thanks for this. I'll have a try again. Which kind of disks do > > you use for testing? > > Four SCSI disks (1G capacity) inside VM. > > Thanks, > Guoqing >