Re: [PATCH v2 2/3] bcache: fix io error during cache read race

邹明哲 <mingzhe.zou@xxxxxxxxxxxx> · Tue, 24 Dec 2024 10:38:30 +0800 (GMT+08:00)

Hi Coly Li

Yes, it mainly fixes the warning of kernel test robot.

mingzhe

Original:
From：colyli <colyli@xxxxxxx>
Date：2024-12-22 00:17:06(中国 (GMT+08:00))
To：邹明哲<mingzhe.zou@xxxxxxxxxxxx>
Cc：linux-bcache <linux-bcache@xxxxxxxxxxxxxxx> , zoumingzhe <zoumingzhe@xxxxxx>
Subject：Re: [PATCH v2 2/3] bcache: fix io error during cache read race
在 2024-12-20 13:20，邹明哲 写道：
&gt; Hi, Coly:
&gt; 
&gt; Our users have reported this issue to us in their generation 
&gt; environment!
&gt; 
&gt; Please review these patches and provide feedback.
&gt; 
&gt; Thank you very much.

Hi Mingzhe,

Yes it is planed for next week, I will start to look at this series.

BTW, I don't see change log from the v1 to v2 series. Can I assume the 
v2 series fix warning reported by kernel test robot?

Thanks.

Coly Li

&gt; 
&gt; mingzhe
&gt; 
&gt; Original:
&gt; From：mingzhe.zou<mingzhe.zou@xxxxxxxxxxxx>
&gt; Date：2024-11-19 15:40:30(中国 (GMT+08:00))
&gt; To：colyli<colyli@xxxxxxx>
&gt; Cc：linux-bcache<linux-bcache@xxxxxxxxxxxxxxx> ,
&gt; dongsheng.yang<dongsheng.yang@xxxxxxxxxxxx> ,
&gt; zoumingzhe<zoumingzhe@xxxxxx>
&gt; Subject：[PATCH v2 2/3] bcache: fix io error during cache read race
&gt; From: Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx>
&gt; 
&gt; In our production environment, bcache returned IO_ERROR(errno=-5).
&gt; These errors always happen during 1M read IO under high pressure
&gt; and without any message log. When the error occurred, we stopped
&gt; all reading and writing and used 1M read IO to read the entire disk
&gt; without any errors. Later we found that cache_read_races of cache_set
&gt; is non-zero.
&gt; 
&gt; If a large (1M) read bio is split into two or more bios, when one bio
&gt; reads dirty data, s-&gt;read_dirty_data will be set to true and remain.
&gt; If the bucket was reused while our subsequent read bio was in flight,
&gt; the read will be unrecoverable(cannot read data from backing).
&gt; 
&gt; This patch increases the count for bucket-&gt;pin to prevent the bucket
&gt; from being reclaimed and reused.
&gt; 
&gt; Signed-off-by: Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx>
&gt; ---
&gt;  drivers/md/bcache/request.c | 39 ++++++++++++++++++++++++-------------
&gt;  1 file changed, 26 insertions(+), 13 deletions(-)
&gt; 
&gt; diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
&gt; index af345dc6fde1..6c41957138e5 100644
&gt; --- a/drivers/md/bcache/request.c
&gt; +++ b/drivers/md/bcache/request.c
&gt; @@ -502,12 +502,8 @@ static void bch_cache_read_endio(struct bio *bio)
&gt;  	struct closure *cl = bio-&gt;bi_private;
&gt;  	struct search *s = container_of(cl, struct search, cl);
&gt; 
&gt; -	/*
&gt; -	 * If the bucket was reused while our bio was in flight, we might 
&gt; have
&gt; -	 * read the wrong data. Set s-&gt;error but not error so it doesn't 
&gt; get
&gt; -	 * counted against the cache device, but we'll still reread the data
&gt; -	 * from the backing device.
&gt; -	 */
&gt; +	BUG_ON(ptr_stale(s-&gt;iop.c, &amp;b-&gt;key, 0)); // bucket should
&gt; not be reused
&gt; +	atomic_dec(&amp;PTR_BUCKET(s-&gt;iop.c, &amp;b-&gt;key, 0)-&gt;pin);
&gt; 
&gt;  	if (bio-&gt;bi_status)
&gt;  		s-&gt;iop.status = bio-&gt;bi_status;
&gt; @@ -520,6 +516,8 @@ static void bch_cache_read_endio(struct bio *bio)
&gt;  	bch_bbio_endio(s-&gt;iop.c, bio, bio-&gt;bi_status, "reading from 
&gt; cache");
&gt;  }
&gt; 
&gt; +static void backing_request_endio(struct bio *bio);
&gt; +
&gt;  /*
&gt;   * Read from a single key, handling the initial cache miss if the key 
&gt; starts in
&gt;   * the middle of the bio
&gt; @@ -529,7 +527,6 @@ static int cache_lookup_fn(struct btree_op *op,
&gt; struct btree *b, struct bkey *k)
&gt;  	struct search *s = container_of(op, struct search, op);
&gt;  	struct bio *n, *bio = &amp;s-&gt;bio.bio;
&gt;  	struct bkey *bio_key;
&gt; -	unsigned int ptr;
&gt; 
&gt;  	if (bkey_cmp(k, &amp;KEY(s-&gt;iop.inode, bio-&gt;bi_iter.bi_sector,
&gt; 0)) &lt;= 0)
&gt;  		return MAP_CONTINUE;
&gt; @@ -553,20 +550,36 @@ static int cache_lookup_fn(struct btree_op *op,
&gt; struct btree *b, struct bkey *k)
&gt;  	if (!KEY_SIZE(k))
&gt;  		return MAP_CONTINUE;
&gt; 
&gt; -	/* XXX: figure out best pointer - for multiple cache devices */
&gt; -	ptr = 0;
&gt; +	/*
&gt; +	 * If the bucket was reused while our bio was in flight, we might 
&gt; have
&gt; +	 * read the wrong data. Set s-&gt;cache_read_races and reread the 
&gt; data
&gt; +	 * from the backing device.
&gt; +	 */
&gt; +	spin_lock(&amp;PTR_BUCKET(b-&gt;c, k, 0)-&gt;lock);
&gt; +	if (ptr_stale(s-&gt;iop.c, k, 0)) {
&gt; +		spin_unlock(&amp;PTR_BUCKET(b-&gt;c, k, 0)-&gt;lock);
&gt; +		atomic_long_inc(&amp;s-&gt;iop.c-&gt;cache_read_races);
&gt; +		pr_warn("%pU cache read race count: %lu", 
&gt; s-&gt;iop.c-&gt;sb.set_uuid,
&gt; +			atomic_long_read(&amp;s-&gt;iop.c-&gt;cache_read_races));
&gt; 
&gt; -	PTR_BUCKET(b-&gt;c, k, ptr)-&gt;prio = INITIAL_PRIO;
&gt; +		n-&gt;bi_end_io	= backing_request_endio;
&gt; +		n-&gt;bi_private	= &amp;s-&gt;cl;
&gt; +
&gt; +		/* I/O request sent to backing device */
&gt; +		closure_bio_submit(s-&gt;iop.c, n, &amp;s-&gt;cl);
&gt; +		return n == bio ? MAP_DONE : MAP_CONTINUE;
&gt; +	}
&gt; +	atomic_inc(&amp;PTR_BUCKET(s-&gt;iop.c, k, 0)-&gt;pin);
&gt; +	spin_unlock(&amp;PTR_BUCKET(b-&gt;c, k, 0)-&gt;lock);
&gt; 
&gt; -	if (KEY_DIRTY(k))
&gt; -		s-&gt;read_dirty_data = true;
&gt; +	PTR_BUCKET(b-&gt;c, k, 0)-&gt;prio = INITIAL_PRIO;
&gt; 
&gt;  	n = bio_next_split(bio, min_t(uint64_t, INT_MAX,
&gt;  				      KEY_OFFSET(k) - bio-&gt;bi_iter.bi_sector),
&gt;  			   GFP_NOIO, &amp;s-&gt;d-&gt;bio_split);
&gt; 
&gt;  	bio_key = &amp;container_of(n, struct bbio, bio)-&gt;key;
&gt; -	bch_bkey_copy_single_ptr(bio_key, k, ptr);
&gt; +	bch_bkey_copy_single_ptr(bio_key, k, 0);
&gt; 
&gt;  	bch_cut_front(&amp;KEY(s-&gt;iop.inode, n-&gt;bi_iter.bi_sector, 0), 
&gt; bio_key);
&gt;  	bch_cut_back(&amp;KEY(s-&gt;iop.inode, bio_end_sector(n), 0), 
&gt; bio_key);

</mingzhe.zou@xxxxxxxxxxxx></mingzhe.zou@xxxxxxxxxxxx></zoumingzhe@xxxxxx></dongsheng.yang@xxxxxxxxxxxx></linux-bcache@xxxxxxxxxxxxxxx></colyli@xxxxxxx></mingzhe.zou@xxxxxxxxxxxx></zoumingzhe@xxxxxx></linux-bcache@xxxxxxxxxxxxxxx></mingzhe.zou@xxxxxxxxxxxx></colyli@xxxxxxx>