Re: [PATCH] block: fix bio-allocation from per-cpu cache

Pavel Begunkov <asml.silence@xxxxxxxxx> · Thu, 27 Oct 2022 22:35:15 +0100

On 10/27/22 21:52, Jens Axboe wrote:
On 10/27/22 2:45 PM, Pavel Begunkov wrote:
On 10/27/22 21:44, Jens Axboe wrote:
On 10/27/22 4:04 AM, Kanchan Joshi wrote:
If cache does not have any entry, make sure to detect that and return
failure. Otherwise this leads to null pointer dereference.

Fixes: 13a184e26965 ("block/bio: add pcpu caching for non-polling bio_put")
Signed-off-by: Kanchan Joshi <joshi.k@xxxxxxxxxxx>
---
Can be reproduced by:
fio -direct=1 -iodepth=1 -rw=randread -ioengine=io_uring -bs=4k -numjobs=1 -size=4k -filename=/dev/nvme0n1 -hipri=1 -name=block

BUG: KASAN: null-ptr-deref in bio_alloc_bioset.cold+0x2a/0x16a
Read of size 8 at addr 0000000000000000 by task fio/1835

CPU: 5 PID: 1835 Comm: fio Not tainted 6.1.0-rc2+ #226
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g
Call Trace:
   <TASK>
   dump_stack_lvl+0x34/0x48
   print_report+0x490/0x4a1
   ? __virt_addr_valid+0x28/0x140
   ? bio_alloc_bioset.cold+0x2a/0x16a
   kasan_report+0xb3/0x130
   ? bio_alloc_bioset.cold+0x2a/0x16a
   bio_alloc_bioset.cold+0x2a/0x16a
   ? bvec_alloc+0xf0/0xf0
   ? iov_iter_is_aligned+0x130/0x2c0
   blkdev_direct_IO.part.0+0x16a/0x8d0

Was going to apply this, but after running some testing, it does
fix the initial crash but I still get weird corruption crashes
with the series it's fixing.

Pavel, I'm going to drop this series for now.

I found one yesterday. Is the issue reproducible?

Oh yeah, triggers in < 1 second for me when running my usual irq
peak bench:

t/io_uring -p0 -d128 -b512 -s32 -c32 -F1 -B1 -R0 -X1 -n24 -P1 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1 /dev/nvme10n1 /dev/nvme11n1 /dev/nvme12n1 /dev/nvme13n1 /dev/nvme14n1 /dev/nvme15n1 /dev/nvme16n1 /dev/nvme17n1 /dev/nvme18n1 /dev/nvme19n1 /dev/nvme20n1 /dev/nvme21n1 /dev/nvme22n1 /dev/nvme23n1

Interestingly, doesn't trigger in qemu with just a single device.

The bug I mentioned is splicing from in-IRQ put, which modifies
the non-irq list. We need to hit that ALLOC_CACHE_MAX + ALLOC_CACHE_SLACK
in the cache to trigger it, so makes sense you see it only with
very high qd tests, matches the profile.

I'll resend the patch set with a few changes, but would be great
if you can say if sth like below works for you

diff --git a/block/bio.c b/block/bio.c
index 0686a3774157..af715aee239b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -764,6 +764,12 @@ static inline void bio_put_percpu_cache(struct bio *bio)
 	struct bio_alloc_cache *cache;
 
 	cache = per_cpu_ptr(bio->bi_pool->cache, get_cpu());
+	if (READ_ONCE(cache->nr_irq) + cache->nr > ALLOC_CACHE_MAX) {
+		put_cpu();
+		bio_free(bio);
+		return;
+	}
+
 	bio_uninit(bio);
 
 	if ((bio->bi_opf & REQ_POLLED) && !WARN_ON_ONCE(in_interrupt())) {
@@ -779,10 +785,6 @@ static inline void bio_put_percpu_cache(struct bio *bio)
 		cache->nr_irq++;
 		local_irq_restore(flags);
 	}
-
-	if (READ_ONCE(cache->nr_irq) + cache->nr >
-	    ALLOC_CACHE_MAX + ALLOC_CACHE_SLACK)
-		bio_alloc_cache_prune(cache, ALLOC_CACHE_SLACK);
 	put_cpu();
 }