On 10/27/22 21:52, Jens Axboe wrote:
On 10/27/22 2:45 PM, Pavel Begunkov wrote:
On 10/27/22 21:44, Jens Axboe wrote:
On 10/27/22 4:04 AM, Kanchan Joshi wrote:
If cache does not have any entry, make sure to detect that and return
failure. Otherwise this leads to null pointer dereference.
Fixes: 13a184e26965 ("block/bio: add pcpu caching for non-polling bio_put")
Signed-off-by: Kanchan Joshi <joshi.k@xxxxxxxxxxx>
---
Can be reproduced by:
fio -direct=1 -iodepth=1 -rw=randread -ioengine=io_uring -bs=4k -numjobs=1 -size=4k -filename=/dev/nvme0n1 -hipri=1 -name=block
BUG: KASAN: null-ptr-deref in bio_alloc_bioset.cold+0x2a/0x16a
Read of size 8 at addr 0000000000000000 by task fio/1835
CPU: 5 PID: 1835 Comm: fio Not tainted 6.1.0-rc2+ #226
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
print_report+0x490/0x4a1
? __virt_addr_valid+0x28/0x140
? bio_alloc_bioset.cold+0x2a/0x16a
kasan_report+0xb3/0x130
? bio_alloc_bioset.cold+0x2a/0x16a
bio_alloc_bioset.cold+0x2a/0x16a
? bvec_alloc+0xf0/0xf0
? iov_iter_is_aligned+0x130/0x2c0
blkdev_direct_IO.part.0+0x16a/0x8d0
Was going to apply this, but after running some testing, it does
fix the initial crash but I still get weird corruption crashes
with the series it's fixing.
Pavel, I'm going to drop this series for now.
I found one yesterday. Is the issue reproducible?
Oh yeah, triggers in < 1 second for me when running my usual irq
peak bench:
t/io_uring -p0 -d128 -b512 -s32 -c32 -F1 -B1 -R0 -X1 -n24 -P1 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1 /dev/nvme10n1 /dev/nvme11n1 /dev/nvme12n1 /dev/nvme13n1 /dev/nvme14n1 /dev/nvme15n1 /dev/nvme16n1 /dev/nvme17n1 /dev/nvme18n1 /dev/nvme19n1 /dev/nvme20n1 /dev/nvme21n1 /dev/nvme22n1 /dev/nvme23n1
Interestingly, doesn't trigger in qemu with just a single device.
The bug I mentioned is splicing from in-IRQ put, which modifies
the non-irq list. We need to hit that ALLOC_CACHE_MAX + ALLOC_CACHE_SLACK
in the cache to trigger it, so makes sense you see it only with
very high qd tests, matches the profile.
I'll resend the patch set with a few changes, but would be great
if you can say if sth like below works for you
diff --git a/block/bio.c b/block/bio.c
index 0686a3774157..af715aee239b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -764,6 +764,12 @@ static inline void bio_put_percpu_cache(struct bio *bio)
struct bio_alloc_cache *cache;
cache = per_cpu_ptr(bio->bi_pool->cache, get_cpu());
+ if (READ_ONCE(cache->nr_irq) + cache->nr > ALLOC_CACHE_MAX) {
+ put_cpu();
+ bio_free(bio);
+ return;
+ }
+
bio_uninit(bio);
if ((bio->bi_opf & REQ_POLLED) && !WARN_ON_ONCE(in_interrupt())) {
@@ -779,10 +785,6 @@ static inline void bio_put_percpu_cache(struct bio *bio)
cache->nr_irq++;
local_irq_restore(flags);
}
-
- if (READ_ONCE(cache->nr_irq) + cache->nr >
- ALLOC_CACHE_MAX + ALLOC_CACHE_SLACK)
- bio_alloc_cache_prune(cache, ALLOC_CACHE_SLACK);
put_cpu();
}