Hi Coly Li Yes, it mainly fixes the warning of kernel test robot. mingzhe Original: From:colyli <colyli@xxxxxxx> Date:2024-12-22 00:17:06(中国 (GMT+08:00)) To:邹明哲<mingzhe.zou@xxxxxxxxxxxx> Cc:linux-bcache <linux-bcache@xxxxxxxxxxxxxxx> , zoumingzhe <zoumingzhe@xxxxxx> Subject:Re: [PATCH v2 2/3] bcache: fix io error during cache read race 在 2024-12-20 13:20,邹明哲 写道: > Hi, Coly: > > Our users have reported this issue to us in their generation > environment! > > Please review these patches and provide feedback. > > Thank you very much. Hi Mingzhe, Yes it is planed for next week, I will start to look at this series. BTW, I don't see change log from the v1 to v2 series. Can I assume the v2 series fix warning reported by kernel test robot? Thanks. Coly Li > > mingzhe > > Original: > From:mingzhe.zou<mingzhe.zou@xxxxxxxxxxxx> > Date:2024-11-19 15:40:30(中国 (GMT+08:00)) > To:colyli<colyli@xxxxxxx> > Cc:linux-bcache<linux-bcache@xxxxxxxxxxxxxxx> , > dongsheng.yang<dongsheng.yang@xxxxxxxxxxxx> , > zoumingzhe<zoumingzhe@xxxxxx> > Subject:[PATCH v2 2/3] bcache: fix io error during cache read race > From: Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx> > > In our production environment, bcache returned IO_ERROR(errno=-5). > These errors always happen during 1M read IO under high pressure > and without any message log. When the error occurred, we stopped > all reading and writing and used 1M read IO to read the entire disk > without any errors. Later we found that cache_read_races of cache_set > is non-zero. > > If a large (1M) read bio is split into two or more bios, when one bio > reads dirty data, s->read_dirty_data will be set to true and remain. > If the bucket was reused while our subsequent read bio was in flight, > the read will be unrecoverable(cannot read data from backing). > > This patch increases the count for bucket->pin to prevent the bucket > from being reclaimed and reused. > > Signed-off-by: Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx> > --- > drivers/md/bcache/request.c | 39 ++++++++++++++++++++++++------------- > 1 file changed, 26 insertions(+), 13 deletions(-) > > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c > index af345dc6fde1..6c41957138e5 100644 > --- a/drivers/md/bcache/request.c > +++ b/drivers/md/bcache/request.c > @@ -502,12 +502,8 @@ static void bch_cache_read_endio(struct bio *bio) > struct closure *cl = bio->bi_private; > struct search *s = container_of(cl, struct search, cl); > > - /* > - * If the bucket was reused while our bio was in flight, we might > have > - * read the wrong data. Set s->error but not error so it doesn't > get > - * counted against the cache device, but we'll still reread the data > - * from the backing device. > - */ > + BUG_ON(ptr_stale(s->iop.c, &b->key, 0)); // bucket should > not be reused > + atomic_dec(&PTR_BUCKET(s->iop.c, &b->key, 0)->pin); > > if (bio->bi_status) > s->iop.status = bio->bi_status; > @@ -520,6 +516,8 @@ static void bch_cache_read_endio(struct bio *bio) > bch_bbio_endio(s->iop.c, bio, bio->bi_status, "reading from > cache"); > } > > +static void backing_request_endio(struct bio *bio); > + > /* > * Read from a single key, handling the initial cache miss if the key > starts in > * the middle of the bio > @@ -529,7 +527,6 @@ static int cache_lookup_fn(struct btree_op *op, > struct btree *b, struct bkey *k) > struct search *s = container_of(op, struct search, op); > struct bio *n, *bio = &s->bio.bio; > struct bkey *bio_key; > - unsigned int ptr; > > if (bkey_cmp(k, &KEY(s->iop.inode, bio->bi_iter.bi_sector, > 0)) <= 0) > return MAP_CONTINUE; > @@ -553,20 +550,36 @@ static int cache_lookup_fn(struct btree_op *op, > struct btree *b, struct bkey *k) > if (!KEY_SIZE(k)) > return MAP_CONTINUE; > > - /* XXX: figure out best pointer - for multiple cache devices */ > - ptr = 0; > + /* > + * If the bucket was reused while our bio was in flight, we might > have > + * read the wrong data. Set s->cache_read_races and reread the > data > + * from the backing device. > + */ > + spin_lock(&PTR_BUCKET(b->c, k, 0)->lock); > + if (ptr_stale(s->iop.c, k, 0)) { > + spin_unlock(&PTR_BUCKET(b->c, k, 0)->lock); > + atomic_long_inc(&s->iop.c->cache_read_races); > + pr_warn("%pU cache read race count: %lu", > s->iop.c->sb.set_uuid, > + atomic_long_read(&s->iop.c->cache_read_races)); > > - PTR_BUCKET(b->c, k, ptr)->prio = INITIAL_PRIO; > + n->bi_end_io = backing_request_endio; > + n->bi_private = &s->cl; > + > + /* I/O request sent to backing device */ > + closure_bio_submit(s->iop.c, n, &s->cl); > + return n == bio ? MAP_DONE : MAP_CONTINUE; > + } > + atomic_inc(&PTR_BUCKET(s->iop.c, k, 0)->pin); > + spin_unlock(&PTR_BUCKET(b->c, k, 0)->lock); > > - if (KEY_DIRTY(k)) > - s->read_dirty_data = true; > + PTR_BUCKET(b->c, k, 0)->prio = INITIAL_PRIO; > > n = bio_next_split(bio, min_t(uint64_t, INT_MAX, > KEY_OFFSET(k) - bio->bi_iter.bi_sector), > GFP_NOIO, &s->d->bio_split); > > bio_key = &container_of(n, struct bbio, bio)->key; > - bch_bkey_copy_single_ptr(bio_key, k, ptr); > + bch_bkey_copy_single_ptr(bio_key, k, 0); > > bch_cut_front(&KEY(s->iop.inode, n->bi_iter.bi_sector, 0), > bio_key); > bch_cut_back(&KEY(s->iop.inode, bio_end_sector(n), 0), > bio_key); </mingzhe.zou@xxxxxxxxxxxx></mingzhe.zou@xxxxxxxxxxxx></zoumingzhe@xxxxxx></dongsheng.yang@xxxxxxxxxxxx></linux-bcache@xxxxxxxxxxxxxxx></colyli@xxxxxxx></mingzhe.zou@xxxxxxxxxxxx></zoumingzhe@xxxxxx></linux-bcache@xxxxxxxxxxxxxxx></mingzhe.zou@xxxxxxxxxxxx></colyli@xxxxxxx>