On 5/16/21 4:42 PM, Matthias Ferdinand wrote: > On Sat, May 15, 2021 at 09:06:07PM +0200, Thorsten Knabe wrote: >> Hello. >> >> Starting with Linux 5.12 bcache triggers a BUG() after a few minutes of >> usage. >> >> Linux up to 5.11.x is not affected by this bug. >> >> Environment: >> - Debian 10 AMD 64 >> - Kernel 5.12 - 5.12.4 >> - Filesystem ext4 >> - Backing device: degraded software RAID-6 (MD) with 3 of 4 disks active >> (unsure if the degraded RAID-6 has an effect or not) >> - Cache device: Single SSD > > Sorry I can't immediately help with bcache, but for DRBD, there was a > similar problem with DRBD on degraded md-raid fixed just recently: > > https://lists.linbit.com/pipermail/drbd-user/2021-May/025904.html > > Although they had silent data corruption AFAICT, not a loud BUG(), and > they stated problems started with kernel 4.3. > > For DRBD it had to do with split BIOs and readahead, which degraded > md-raid may or may not fail, and missing a fail on parts of a split-up > readahead BIO. > > Matthias > This is caused by a hidden issue which is triggered by the bio code change in v5.12. The attached patch can help to avoid the panic, and the finally fixes are under testing and will be posted very soon. Coly Li
From 6f2edee7100efabf2ccccb84e4a92ccbfbddd8c5 Mon Sep 17 00:00:00 2001 From: Coly Li <colyli@xxxxxxx> Date: Thu, 6 May 2021 10:38:41 +0800 Subject: [PATCH] bcache: avoid oversized bio_alloc_bioset() call in cached_dev_cache_miss() Since Linux v5.12, calling bio_alloc_bioset() with oversized bio vectors number will cause a BUG() panic in biovec_slab(). There are 2 locations in bcache code calling bio_alloc_bioset(), and only the location in cached_dev_cache_miss() has such potential oversized risk. In cached_dev_cache_miss() the bio vectors number is calculated by DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS), this patch checks the calculated result, if it is larger than BIO_MAX_VECS, then give up the allocation of cache_bio and sending request to backing device directly. By this restriction, the potential BUG() panic can be avoided from the cache missing code path. Signed-off-by: Coly Li <colyli@xxxxxxx> --- drivers/md/bcache/request.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 29c231758293..a657d3a2b624 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -879,7 +879,7 @@ static void cached_dev_read_done_bh(struct closure *cl) static int cached_dev_cache_miss(struct btree *b, struct search *s, struct bio *bio, unsigned int sectors) { - int ret = MAP_CONTINUE; + int ret = MAP_CONTINUE, nr_iovecs = 0; unsigned int reada = 0; struct cached_dev *dc = container_of(s->d, struct cached_dev, disk); struct bio *miss, *cache_bio; @@ -916,9 +916,14 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s, /* btree_search_recurse()'s btree iterator is no good anymore */ ret = miss == bio ? MAP_DONE : -EINTR; - cache_bio = bio_alloc_bioset(GFP_NOWAIT, - DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS), - &dc->disk.bio_split); + nr_iovecs = DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS); + if (nr_iovecs > BIO_MAX_VECS) { + pr_warn("inserting bio is too large: %d iovecs, not intsert.\n", + nr_iovecs); + goto out_submit; + } + cache_bio = bio_alloc_bioset(GFP_NOWAIT, nr_iovecs, + &dc->disk.bio_split); if (!cache_bio) goto out_submit; -- 2.26.2