The patch titled Subject: mm/dmapool.c: fix boundary comparison has been added to the -mm tree. Its filename is dmapool-fix-boundary-comparison.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/dmapool-fix-boundary-comparison.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/dmapool-fix-boundary-comparison.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Tony Battersby <tonyb@xxxxxxxxxxxxxxx> Subject: mm/dmapool.c: fix boundary comparison Patch series "mpt3sas and dmapool scalability", v4. drivers/scsi/mpt3sas is running into a scalability problem with the kernel's DMA pool implementation. With a LSI/Broadcom SAS 9300-8i 12Gb/s HBA and max_sgl_entries=256, during modprobe, mpt3sas does the equivalent of: chain_dma_pool = dma_pool_create(size = 128); for (i = 0; i < 373959; i++) { dma_addr[i] = dma_pool_alloc(chain_dma_pool); } And at rmmod, system shutdown, or system reboot, mpt3sas does the equivalent of: for (i = 0; i < 373959; i++) { dma_pool_free(chain_dma_pool, dma_addr[i]); } dma_pool_destroy(chain_dma_pool); With this usage, both dma_pool_alloc() and dma_pool_free() exhibit O(n^2) complexity, although dma_pool_free() is much worse due to implementation details. On my system, the dma_pool_free() loop above takes about 9 seconds to run. Note that the problem was even worse before commit 74522a92bbf0 ("scsi: mpt3sas: Optimize I/O memory consumption in driver."), where the dma_pool_free() loop could take ~30 seconds. mpt3sas also has some other DMA pools, but chain_dma_pool is the only one with so many allocations: cat /sys/devices/pci0000:80/0000:80:07.0/0000:85:00.0/pools (manually cleaned up column alignment) poolinfo - 0.1 reply_post_free_array pool 1 21 192 1 reply_free pool 1 1 41728 1 reply pool 1 1 1335296 1 sense pool 1 1 970272 1 chain pool 373959 386048 128 12064 reply_post_free pool 12 12 166528 12 The patches in this series improve the scalability of the DMA pool implementation, which significantly reduces the running time of the DMA alloc/free loops. With the patches applied, "modprobe mpt3sas", "rmmod mpt3sas", and system shutdown/reboot with mpt3sas loaded are significantly faster. Here are some benchmarks (of DMA alloc/free only, not the entire modprobe/rmmod): This patch (of 9): Fix the boundary comparison when constructing the list of free blocks for the case that 'size' is a power of two. Since 'boundary' is also a power of two, that would make 'boundary' a multiple of 'size', in which case a single block would never cross the boundary. This bug would cause some of the allocated memory to be wasted (but not leaked). Example: size = 512 boundary = 2048 allocation = 4096 Address range 0 - 511 512 - 1023 1024 - 1535 1536 - 2047 * 2048 - 2559 2560 - 3071 3072 - 3583 3584 - 4095 * Prior to this fix, the address ranges marked with "*" would not have been used even though they didn't cross the given boundary. Link: http://lkml.kernel.org/r/acce3a38-9930-349d-5299-95d2aa5c47e4@xxxxxxxxxxxxxxx Fixes: e34f44b3517f ("pool: Improve memory usage for devices which can't cross boundaries") Signed-off-by: Tony Battersby <tonyb@xxxxxxxxxxxxxxx> Acked-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxx> Cc: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> Cc: Andy Shevchenko <andy.shevchenko@xxxxxxxxx> Cc: John Garry <john.garry@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/dmapool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/dmapool.c~dmapool-fix-boundary-comparison +++ a/mm/dmapool.c @@ -210,7 +210,7 @@ static void pool_initialise_page(struct do { unsigned int next = offset + pool->size; - if (unlikely((next + pool->size) >= next_boundary)) { + if (unlikely((next + pool->size) > next_boundary)) { next = next_boundary; next_boundary += pool->boundary; } _ Patches currently in -mm which might be from tonyb@xxxxxxxxxxxxxxx are dmapool-fix-boundary-comparison.patch dmapool-remove-checks-for-dev-==-null.patch dmapool-cleanup-dma_pool_destroy.patch dmapool-improve-scalability-of-dma_pool_alloc.patch dmapool-rename-fields-in-dma_page.patch dmapool-improve-scalability-of-dma_pool_free.patch dmapool-cleanup-integer-types.patch dmapool-improve-accuracy-of-debug-statistics.patch dmapool-debug-prevent-endless-loop-in-case-of-corruption.patch