Hi, As some are probably aware, I've been doing efficiency chasing for block devices using io_uring. Currently we can do about 5.1M IOPS using a single logical core of a single physical core, but that's with a kernel config that's shaved down. Enabling BLK_CGROUP and performance drops to ~3.6M for the same test. These two patches bring us to 3.9M, which is a nice improvement. More work to be done here. -- Jens Axboe