From: Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx> In our production environment, bcache returned IO_ERROR(errno=-5). These errors always happen during 1M read IO under high pressure and without any message log. When the error occurred, we stopped all reading and writing and used 1M read IO to read the entire disk without any errors. Later we found that cache_read_races of cache_set is non-zero. If a large (1M) read bio is split into two or more bios, when one bio reads dirty data, s->read_dirty_data will be set to true and remain. If the bucket was reused while our subsequent read bio was in flight, the read will be unrecoverable(cannot read data from backing). This patch reassigns s->recoverable and s->read_dirty_data before each cache read. When a race condition occurs, check whether it can read from the backing device. Signed-off-by: Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx> --- drivers/md/bcache/request.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index af345dc6fde1..e9cb3ad323d4 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -513,6 +513,7 @@ static void bch_cache_read_endio(struct bio *bio) s->iop.status = bio->bi_status; else if (!KEY_DIRTY(&b->key) && ptr_stale(s->iop.c, &b->key, 0)) { + BUG_ON(s->recoverable && s->read_dirty_data); atomic_long_inc(&s->iop.c->cache_read_races); s->iop.status = BLK_STS_IOERR; } @@ -558,8 +559,9 @@ static int cache_lookup_fn(struct btree_op *op, struct btree *b, struct bkey *k) PTR_BUCKET(b->c, k, ptr)->prio = INITIAL_PRIO; - if (KEY_DIRTY(k)) - s->read_dirty_data = true; + s->read_dirty_data = KEY_DIRTY(k) ? true : false; + /* Cache read errors are recoverable */ + s->recoverable = true; n = bio_next_split(bio, min_t(uint64_t, INT_MAX, KEY_OFFSET(k) - bio->bi_iter.bi_sector), @@ -574,6 +576,7 @@ static int cache_lookup_fn(struct btree_op *op, struct btree *b, struct bkey *k) n->bi_end_io = bch_cache_read_endio; n->bi_private = &s->cl; + /* * The bucket we're reading from might be reused while our bio * is in flight, and we could then end up reading the wrong -- 2.34.1