Re: PROBLEM: bcache related kernel BUG() since Linux 5.12

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/16/21 4:42 PM, Matthias Ferdinand wrote:
> On Sat, May 15, 2021 at 09:06:07PM +0200, Thorsten Knabe wrote:
>> Hello.
>>
>> Starting with Linux 5.12 bcache triggers a BUG() after a few minutes of
>> usage.
>>
>> Linux up to 5.11.x is not affected by this bug.
>>
>> Environment:
>> - Debian 10 AMD 64
>> - Kernel 5.12 - 5.12.4
>> - Filesystem ext4
>> - Backing device: degraded software RAID-6 (MD) with 3 of 4 disks active
>>   (unsure if the degraded RAID-6 has an effect or not)
>> - Cache device: Single SSD
> 
> Sorry I can't immediately help with bcache, but for DRBD, there was a
> similar problem with DRBD on degraded md-raid fixed just recently:
> 
>     https://lists.linbit.com/pipermail/drbd-user/2021-May/025904.html
> 
> Although they had silent data corruption AFAICT, not a loud BUG(), and
> they stated problems started with kernel 4.3.
> 
> For DRBD it had to do with split BIOs and readahead, which degraded
> md-raid may or may not fail, and missing a fail on parts of a split-up
> readahead BIO.
> 
> Matthias
> 


This is caused by a hidden issue which is triggered by the bio code
change in v5.12.

The attached patch can help to avoid the panic, and the finally fixes
are under testing and will be posted very soon.

Coly Li
From 6f2edee7100efabf2ccccb84e4a92ccbfbddd8c5 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli@xxxxxxx>
Date: Thu, 6 May 2021 10:38:41 +0800
Subject: [PATCH] bcache: avoid oversized bio_alloc_bioset() call in
 cached_dev_cache_miss()

Since Linux v5.12, calling bio_alloc_bioset() with oversized bio vectors
number will cause a BUG() panic in biovec_slab(). There are 2 locations
in bcache code calling bio_alloc_bioset(), and only the location in
cached_dev_cache_miss() has such potential oversized risk.

In cached_dev_cache_miss() the bio vectors number is calculated by
DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS), this patch checks the
calculated result, if it is larger than BIO_MAX_VECS, then give up the
allocation of cache_bio and sending request to backing device directly.

By this restriction, the potential BUG() panic can be avoided from the
cache missing code path.

Signed-off-by: Coly Li <colyli@xxxxxxx>
---
 drivers/md/bcache/request.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 29c231758293..a657d3a2b624 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -879,7 +879,7 @@ static void cached_dev_read_done_bh(struct closure *cl)
 static int cached_dev_cache_miss(struct btree *b, struct search *s,
 				 struct bio *bio, unsigned int sectors)
 {
-	int ret = MAP_CONTINUE;
+	int ret = MAP_CONTINUE, nr_iovecs = 0;
 	unsigned int reada = 0;
 	struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
 	struct bio *miss, *cache_bio;
@@ -916,9 +916,14 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
 	/* btree_search_recurse()'s btree iterator is no good anymore */
 	ret = miss == bio ? MAP_DONE : -EINTR;
 
-	cache_bio = bio_alloc_bioset(GFP_NOWAIT,
-			DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS),
-			&dc->disk.bio_split);
+	nr_iovecs = DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS);
+	if (nr_iovecs > BIO_MAX_VECS) {
+		pr_warn("inserting bio is too large: %d iovecs, not intsert.\n",
+			nr_iovecs);
+		goto out_submit;
+	}
+	cache_bio = bio_alloc_bioset(GFP_NOWAIT, nr_iovecs,
+				     &dc->disk.bio_split);
 	if (!cache_bio)
 		goto out_submit;
 
-- 
2.26.2


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux