On 4/26/21 7:48 AM, Christoph Hellwig wrote: > Hi all, > > This series clean up the block polling code a bit and changes the interface > to poll for a specific bio instead of a request_queue and cookie pair. > > Polling for the bio itself leads to a few advantages: > > - the cookie construction can made entirely private in blk-mq.c > - the caller does not need to remember the request_queue and cookie > separately and thus sidesteps their lifetime issues > - keeping the device and the cookie inside the bio allows to trivially > support polling BIOs remapping by stacking drivers > - a lot of code to propagate the cookie back up the submission path can > removed entirely > > The one major caveat is that this requires RCU freeing polled BIOs to make > sure the bio that contains the polling information is still alive when > io_uring tries to poll it through the iocb. For synchronous polling all the > callers have a bio reference anyway, so this is not an issue. Was curious about this separately, so ran a quick test on it. Running polled IO on a fast device, performance drops about 10% with this applied. Outside of that, we have ksoftirqd using 5-7% of CPU continually, just doing frees: + 45.33% ksoftirqd/0 [kernel.vmlinux] [k] __slab_free + 15.91% ksoftirqd/0 [kernel.vmlinux] [k] kmem_cache_free + 12.66% ksoftirqd/0 [kernel.vmlinux] [k] rcu_cblist_dequeue + 8.39% ksoftirqd/0 [kernel.vmlinux] [k] rcu_core + 4.75% ksoftirqd/0 [kernel.vmlinux] [k] free_one_page + 3.27% ksoftirqd/0 [kernel.vmlinux] [k] bio_free_rcu + 1.98% ksoftirqd/0 [kernel.vmlinux] [k] mempool_free_slab This all means that we go from 2.97M IOPS to 2.70M IOPS in that particular test (QD=128, async polled). I was separately curious about this as I have a (as of yet unposted) patchset that recycles bio allocations, as we spend quite a bit of time doing that for high rate polled IO. It's good for taking the above 2.97M IOPS to 3.2-3.3M IOPS, and it'd obviously be a bit more problematic with required RCU freeing of bio's. Even without the alloc cache, using RCU will ruin any potential cache locality on back-to-back bio free + bio alloc. -- Jens Axboe