[PATCH 1/1] block: optimise in irq bio put caching

Pavel Begunkov <asml.silence@xxxxxxxxx> · Fri, 19 Jan 2024 01:23:44 +0000

The put side of the percpu bio caching is mainly targeting completions
in the hard irq context, but the context is not guaranteed so we guard
against those cases by switching interrupts off.

Disabling interrupts while they're already disabled is supposed to be
fast, but profiling shows it's far from perfect. Instead, we can infer
the interrupt state from in_hardirq(), which is just a fast var read,
and fall back to the normal bio_free() otherwise. With that, the caching
doesn't cover in softirq/task completions anymore, but that should be
just fine, we have never measured if caching brings anything in those
scenarios.

Profiling indicates that the bio_put() cost is reduced by ~3.5 times
(1.76% -> 0.49%), and and throughput of CPU bound benchmarks improve
by around 1% (t/io_uring with high QD and several drives).

Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx>
---
 block/bio.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 816d412c06e9..a8a4f3211893 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -763,26 +763,29 @@ static inline void bio_put_percpu_cache(struct bio *bio)
 
 	cache = per_cpu_ptr(bio->bi_pool->cache, get_cpu());
 	if (READ_ONCE(cache->nr_irq) + cache->nr > ALLOC_CACHE_MAX) {
+free:
 		put_cpu();
 		bio_free(bio);
 		return;
 	}
 
-	bio_uninit(bio);
-
-	if ((bio->bi_opf & REQ_POLLED) && !WARN_ON_ONCE(in_interrupt())) {
+	if (bio->bi_opf & REQ_POLLED) {
+		if (WARN_ON_ONCE(!in_task()))
+			goto free;
+		bio_uninit(bio);
 		bio->bi_next = cache->free_list;
 		bio->bi_bdev = NULL;
 		cache->free_list = bio;
 		cache->nr++;
-	} else {
-		unsigned long flags;
+	} else if (in_hardirq()) {
+		lockdep_assert_irqs_disabled();
 
-		local_irq_save(flags);
+		bio_uninit(bio);
 		bio->bi_next = cache->free_list_irq;
 		cache->free_list_irq = bio;
 		cache->nr_irq++;
-		local_irq_restore(flags);
+	} else {
+		goto free;
 	}
 	put_cpu();
 }
-- 
2.43.0