On Thu, Nov 21, 2024 at 11:06 AM Keith Busch <kbusch@xxxxxxxxxx> wrote: > If you have the time, could you compare with using xarray instead? Sure. Good idea. **With the submitted patches applied AND using an xarray for vaddr-to-block translations:** ``` dmapool test: size:16 align:16 blocks:8192 time:37954 dmapool test: size:64 align:64 blocks:8192 time:40036 dmapool test: size:256 align:256 blocks:8192 time:41942 dmapool test: size:1024 align:1024 blocks:2048 time:10964 dmapool test: size:4096 align:4096 blocks:1024 time:6101 dmapool test: size:68 align:32 blocks:8192 time:41307 ``` The xarray approach shows a slight improvement in performance compared to the maple tree approach. FWIW, I implemented the two with slightly different semantics: - In the maple tree implementation, I saved the `block`'s entire `vaddr` range, allowing any `vaddr` within the `block` to be passed to `dma_pool_free()`. - In the xarray implementation, I saved only the `block's` base `vaddr`, requiring `dma_pool_free()` to be called with the exact `vaddr` returned by `dma_pool_alloc()`. This aligns with the DMA pool API documentation, which specifies that the `vaddr` returned by `dma_pool_alloc()` should be passed to `dma_pool_free()`. Let me know if you'd like further adjustments. Thanks, Brian Johannesmeyer