Hi, Currently passthru IO is slower than bdev O_DIRECT. One of the reasons is that we do two allocations for each IO: - One alloc+free for the page array for mapping the data - One alloc+free of the bio Let passthru IO dip into the bio cache to eliminate that one, and use UIO_FASTIOV to gate whether we need to alloc+free the page array for mapping purposes. This closes about half of the gap between passthru and bdev dio for me. If we can sanely wire up completion batching for passthru, then that would almost fully close the gap. Outside of that, the main missing feature for passthru is the ability to use registered buffers with io_uring, as the per-io get_user_pages() is a large cycle consumer as well. -- Jens Axboe