On Tue, May 14, 2024 at 1:25 PM Chenliang Li <cliang01.li@xxxxxxxxxxx> wrote: > > Registered buffers are stored and processed in the form of bvec array, > each bvec element typically points to a PAGE_SIZE page but can also work > with hugepages. Specifically, a buffer consisting of a hugepage is > coalesced to use only one hugepage bvec entry during registration. > This coalescing feature helps to save both the space and DMA-mapping time. > > However, currently the coalescing feature doesn't work for multi-hugepage > buffers. For a buffer with several 2M hugepages, we still split it into > thousands of 4K page bvec entries while in fact, we can just use a > handful of hugepage bvecs. > > This patch series enables coalescing registered buffers with more than > one hugepages. It optimizes the DMA-mapping time and saves memory for > these kind of buffers. > > Testing: > > The hugepage fixed buffer I/O can be tested using fio without > modification. The fio command used in the following test is given > in [1]. There's also a liburing testcase in [2]. Also, the system > should have enough hugepages available before testing. > > Perf diff of 8M(4 * 2M hugepages) fio randread test: > > Before After Symbol > ..................................................... > 4.68% [k] __blk_rq_map_sg > 3.31% [k] dma_direct_map_sg > 2.64% [k] dma_pool_alloc > 1.09% [k] sg_next > +0.49% [k] dma_map_page_attrs > > Perf diff of 8M fio randwrite test: > > Before After Symbol > ...................................................... > 2.82% [k] __blk_rq_map_sg > 2.05% [k] dma_direct_map_sg > 1.75% [k] dma_pool_alloc > 0.68% [k] sg_next > +0.08% [k] dma_map_page_attrs > > First three patches prepare for adding the multi-hugepage coalescing > into buffer registration, the 4th patch enables the feature. > > ----------------- > Changes since v3: > > - Delete unnecessary commit message > - Update test command and test results > > v3 : https://lore.kernel.org/io-uring/20240514001614.566276-1-cliang01.li@xxxxxxxxxxx/T/#t > > Changes since v2: > > - Modify the loop iterator increment to make code cleaner > - Minor fix to the return procedure in coalesced buffer account > - Correct commit messages > - Add test cases in liburing > > v2 : https://lore.kernel.org/io-uring/20240513020149.492727-1-cliang01.li@xxxxxxxxxxx/T/#t > > Changes since v1: > > - Split into 4 patches > - Fix code style issues > - Rearrange the change of code for cleaner look > - Add speciallized pinned page accounting procedure for coalesced > buffers > - Reordered the newly add fields in imu struct for better compaction > > v1 : https://lore.kernel.org/io-uring/20240506075303.25630-1-cliang01.li@xxxxxxxxxxx/T/#u > > [1] > fio -iodepth=64 -rw=randread(-rw=randwrite) -direct=1 -ioengine=io_uring \ > -bs=8M -numjobs=1 -group_reporting -mem=shmhuge -fixedbufs -hugepage-size=2M \ > -filename=/dev/nvme0n1 -runtime=10s -name=test1 > > [2] > https://lore.kernel.org/io-uring/20240514051343.582556-1-cliang01.li@xxxxxxxxxxx/T/#u > > Chenliang Li (4): > io_uring/rsrc: add hugepage buffer coalesce helpers > io_uring/rsrc: store folio shift and mask into imu > io_uring/rsrc: add init and account functions for coalesced imus > io_uring/rsrc: enable multi-hugepage buffer coalescing > > io_uring/rsrc.c | 217 +++++++++++++++++++++++++++++++++++++++--------- > io_uring/rsrc.h | 12 +++ > 2 files changed, 191 insertions(+), 38 deletions(-) > > > base-commit: 59b28a6e37e650c0d601ed87875b6217140cda5d > -- > 2.34.1 > > I tested this series by registering multi-hugepage buffers. The coalescing helps saving dma-mapping time. This is the gain observed on my setup, while running the fio workload shared here. RandomRead: Baseline DeltaAbs Symbol ..................................................... 3.89% -3.62% [k] blk_rq_map_sg 3.58% -3.23% [k] dma_direct_map_sg 2.25% -2.23% [k] sg_next RandomWrite: Baseline DeltaAbs Symbol ..................................................... 2.46% -2.31% [k] dma_direct_map_sg 2.06% -2.05% [k] sg_next 2.08% -1.80% [k] blk_rq_map_sg The liburing test case shared works fine too on my setup. Feel free to add: Tested-by: Anuj Gupta <anuj20.g@xxxxxxxxxxx> -- Anuj Gupta