On Mon, May 20, 2024 at 3:27 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2024/05/20 10:55, Yu Kuai 写道: > > Hi, Changhui > > > > 在 2024/05/20 8:39, Changhui Zhong 写道: > >> [czhong@vm linux-block]$ git bisect bad > >> 060406c61c7cb4bbd82a02d179decca9c9bb3443 is the first bad commit > >> commit 060406c61c7cb4bbd82a02d179decca9c9bb3443 > >> Author: Yu Kuai<yukuai3@xxxxxxxxxx> > >> Date: Thu May 9 20:38:25 2024 +0800 > >> > >> block: add plug while submitting IO > >> > >> So that if caller didn't use plug, for example, > >> __blkdev_direct_IO_simple() > >> and __blkdev_direct_IO_async(), block layer can still benefit > >> from caching > >> nsec time in the plug. > >> > >> Signed-off-by: Yu Kuai<yukuai3@xxxxxxxxxx> > >> > >> Link:https://lore.kernel.org/r/20240509123825.3225207-1-yukuai1@xxxxxxxxxxxxxxx > >> > >> Signed-off-by: Jens Axboe<axboe@xxxxxxxxx> > >> > >> block/blk-core.c | 6 ++++++ > >> 1 file changed, 6 insertions(+) > > > > Thanks for the test! > > > > I was surprised to see this blamed commit, and after taking a look at > > raid1 barrier code, I found that there are some known problems, fixed in > > raid10, while raid1 still unfixed. So I wonder this patch maybe just > > making the exist problem easier to reporduce. > > > > I'll start cooking patches to sync raid10 fixes to raid1, meanwhile, > > can you change your script to test raid10 as well, if raid10 is fine, > > I'll give you these patches later to test raid1. > > Hi, > > Sorry to ask, but since I can't reporduce the problem, and based on > code reiview, there are multiple potential problems, can you also > reporduce the problem with following debug patch(just add some debug > info, no functional changes). So that I can make sure of details of > the problem. > Hi,Kuai yeah, I can test your patch, but I hit a problem when applying the patch, please help check it, and I will test it again after you fix it. ``` patching file drivers/md/raid1.c patch: **** malformed patch at line 42: idx, nr, RESYNC_DEPTH); ``` Thanks, Changhui > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 113135e7b5f2..b35b847a9e8b 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -936,6 +936,45 @@ static void flush_pending_writes(struct r1conf *conf) > spin_unlock_irq(&conf->device_lock); > } > > +static bool waiting_barrier(struct r1conf *conf, int idx) > +{ > + int nr = atomic_read(&conf->nr_waiting[idx]); > + > + if (nr) { > + printk("%s: idx %d nr_waiting %d\n", __func__, idx, nr); > + return true; > + } > + > + return false; > +} > + > +static bool waiting_pending(struct r1conf *conf, int idx) > +{ > + int nr; > + > + if (test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery)) > + return false; > + > + if (conf->array_frozen) { > + printk("%s: array is frozen\n", __func__); > + return true; > + } > + > + nr = atomic_read(&conf->nr_pending[idx]); > + if (nr) { > + printk("%s: idx %d nr_pending %d\n", __func__, idx, nr); > + return true; > + } > + > + nr = atomic_read(&conf->barrier[idx]); > + if (nr >= RESYNC_DEPTH) { > + printk("%s: idx %d barrier %d exceeds %d\n", __func__, > idx, nr, RESYNC_DEPTH); > + return true; > + } > + > + return false; > +} > + > /* Barriers.... > * Sometimes we need to suspend IO while we do something else, > * either some resync/recovery, or reconfigure the array. > @@ -967,8 +1006,7 @@ static int raise_barrier(struct r1conf *conf, > sector_t sector_nr) > spin_lock_irq(&conf->resync_lock); > > /* Wait until no block IO is waiting */ > - wait_event_lock_irq(conf->wait_barrier, > - !atomic_read(&conf->nr_waiting[idx]), > + wait_event_lock_irq(conf->wait_barrier, !waiting_barrier(conf, idx), > conf->resync_lock); > > /* block any new IO from starting */ > @@ -990,11 +1028,7 @@ static int raise_barrier(struct r1conf *conf, > sector_t sector_nr) > * C: while conf->barrier[idx] >= RESYNC_DEPTH, meaning reaches > * max resync count which allowed on current I/O barrier bucket. > */ > - wait_event_lock_irq(conf->wait_barrier, > - (!conf->array_frozen && > - !atomic_read(&conf->nr_pending[idx]) && > - atomic_read(&conf->barrier[idx]) < > RESYNC_DEPTH) || > - test_bit(MD_RECOVERY_INTR, > &conf->mddev->recovery), > + wait_event_lock_irq(conf->wait_barrier, !waiting_pending(conf, idx), > conf->resync_lock); > > if (test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery)) { > > Thanks, > Kuai > > > > > Thanks, > > Kuai > > > > . > > >