I’ll go ahead and prepare a V3 patch series with the following updates: - Using an xarray for vaddr-to-block translations, which improves the performance of free operations. - Removing the minimum DMA block size constraint, as it is no longer necessary. Let me know if there are any additional suggestions or concerns to address before submission. Thanks, Brian On Thu, Nov 21, 2024 at 12:07 PM Brian Johannesmeyer <bjohannesmeyer@xxxxxxxxx> wrote: > > On Thu, Nov 21, 2024 at 11:06 AM Keith Busch <kbusch@xxxxxxxxxx> wrote: > > If you have the time, could you compare with using xarray instead? > > Sure. Good idea. > > **With the submitted patches applied AND using an xarray for > vaddr-to-block translations:** > ``` > dmapool test: size:16 align:16 blocks:8192 time:37954 > dmapool test: size:64 align:64 blocks:8192 time:40036 > dmapool test: size:256 align:256 blocks:8192 time:41942 > dmapool test: size:1024 align:1024 blocks:2048 time:10964 > dmapool test: size:4096 align:4096 blocks:1024 time:6101 > dmapool test: size:68 align:32 blocks:8192 time:41307 > ``` > > The xarray approach shows a slight improvement in performance compared > to the maple tree approach. > > FWIW, I implemented the two with slightly different semantics: > - In the maple tree implementation, I saved the `block`'s entire > `vaddr` range, allowing any `vaddr` within the `block` to be passed to > `dma_pool_free()`. > - In the xarray implementation, I saved only the `block's` base > `vaddr`, requiring `dma_pool_free()` to be called with the exact > `vaddr` returned by `dma_pool_alloc()`. This aligns with the DMA pool > API documentation, which specifies that the `vaddr` returned by > `dma_pool_alloc()` should be passed to `dma_pool_free()`. > > Let me know if you'd like further adjustments. > > Thanks, > > Brian Johannesmeyer