> Given that you now need an array of the blocks anyway, it might make > sense to switch from a linked list to a bitmap for tracking free state, > which would be a lot more efficient as you only need a bit per block > as tracking overhead instead of a two pointers and a dma_addr_t. > > e.g. do a find_first_zero_bit() to find the ffree slot, then calculate > the dma_addr and virt address by simple offseting into the dma_page > ones with bitnr * pool->size. Thank you for the suggestion. I hacked together a bitmap-based approach as you proposed, and while it does improve memory efficiency by reducing the per-block metadata overhead, it unfortunately appears to significantly impact the runtime performance. Here are the performance results, with DMAPOOL_DEBUG disabled. The first two sets of numbers are the same as my latest response in the other thread (i.e., [RFC v2 0/2]), and the last set of numbers is with the bitmap approach applied: **Without no patches applied:** ``` dmapool test: size:16 align:16 blocks:8192 time:11860 dmapool test: size:64 align:64 blocks:8192 time:11951 dmapool test: size:256 align:256 blocks:8192 time:12287 dmapool test: size:1024 align:1024 blocks:2048 time:3134 dmapool test: size:4096 align:4096 blocks:1024 time:1686 dmapool test: size:68 align:32 blocks:8192 time:12050 ``` **With the submitted patches applied:** ``` dmapool test: size:16 align:16 blocks:8192 time:34432 dmapool test: size:64 align:64 blocks:8192 time:62262 dmapool test: size:256 align:256 blocks:8192 time:238137 dmapool test: size:1024 align:1024 blocks:2048 time:61386 dmapool test: size:4096 align:4096 blocks:1024 time:75342 dmapool test: size:68 align:32 blocks:8192 time:88243 ``` **With the submitted patches applied AND using a bitmap approach:** ``` dmapool test: size:16 align:16 blocks:8192 time:82733 dmapool test: size:64 align:64 blocks:8192 time:198460 dmapool test: size:256 align:256 blocks:8192 time:710316 dmapool test: size:1024 align:1024 blocks:2048 time:177801 dmapool test: size:4096 align:4096 blocks:1024 time:192297 dmapool test: size:68 align:32 blocks:8192 time:274931 ``` My guess as to why: The current linked list implementation allows us to find the next free block in constant time (`O(1)`) by directly dereferencing `pool->next_block`, and then following the `next_block` pointers for subsequent free blocks. In contrast, the bitmap approach requires iterating over all pages in `page->page_list` and, for each page, iterating through its bitmap to find the first zero bit. This results in a worst-case complexity of `O(n * b)`, where `n` is the number of pages and `b` is the number of bits in each page's bitmap. If you have ideas for mitigating this runtime overhead, I’d be happy to explore them further. Thanks, Brian Johannesmeyer