On 9/25/2024 12:12, Pavel Begunkov wrote: >I don't have a strong opinion on the feature, but the open question >we should get some decision on is whether it's really well applicable to >a good enough set of apps / workloads, if it'll even be useful in the >future and/or for other vendors, and if the merit outweighs extra >8 bytes + 1 flag per io_kiocb and the overhead of 1-2 static key'able >checks in hot paths. IMHO, releasing some of the CPU resources during the polling process may be appropriate for some performance bottlenecks due to CPU resource constraints, such as some database applications, in addition to completing IO operations, CPU also needs to peocess data, like compression and decompression. In a high-concurrency state, not only polling takes up a lot of CPU time, but also operations like calculation and processing also need to compete for CPU time. In this case, the performance of the application may be difficult to improve. The MultiRead interface of Rocksdb has been adapted to io_uring, I used db_bench to construct a situation with high CPU pressure and compared the performance. The test configuration is as follows, ------------------------------------------------------------------- CPU Model Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz CPU Cores 8 Memory 16G SSD Samsung PM9A3 ------------------------------------------------------------------- Test case: ./db_bench --benchmarks=multireadrandom,stats --duration=60 --threads=4/8/16 --use_direct_reads=true --db=/mnt/rocks/test_db --wal_dir=/mnt/rocks/test_db --key_size=4 --value_size=4096 -cache_size=0 -use_existing_db=1 -batch_size=256 -multiread_batched=true -multiread_stride=0 ------------------------------------------------------ Test result: National Optimization threads ops/sec ops/sec CPU Utilization 16 139300 189075 100%*8 8 138639 133191 90%*8 4 71475 68361 90%*8 ------------------------------------------------------ When the number of threads exceeds the number of CPU cores,the database throughput does not increase significantly. However, hybrid polling can releasing some CPU resources during the polling process, so that part of the CPU time can be used for frequent data processing and other operations, which speeds up the reading process, thereby improving throughput and optimizaing database performance.I tried different compression strategies and got results similar to the above table.(~30% throughput improvement) As more database applications adapt to the io_uring engine, I think the application of hybrid poll may have potential in some scenarios. -- Xue