On Mon, Feb 10, 2025 at 09:56:06AM +0800, yangge1116@xxxxxxx wrote: > From: yangge <yangge1116@xxxxxxx> > > For different CMAs, concurrent allocation of CMA memory ideally should not > require synchronization using locks. Currently, a global cma_mutex lock is > employed to synchronize all CMA allocations, which can impact the > performance of concurrent allocations across different CMAs. > > To test the performance impact, follow these steps: > 1. Boot the kernel with the command line argument hugetlb_cma=30G to > allocate a 30GB CMA area specifically for huge page allocations. (note: > on my machine, which has 3 nodes, each node is initialized with 10G of > CMA) > 2. Use the dd command with parameters if=/dev/zero of=/dev/shm/file bs=1G > count=30 to fully utilize the CMA area by writing zeroes to a file in > /dev/shm. > 3. Open three terminals and execute the following commands simultaneously: > (Note: Each of these commands attempts to allocate 10GB [2621440 * 4KB > pages] of CMA memory.) > On Terminal 1: time echo 2621440 > /sys/kernel/debug/cma/hugetlb1/alloc > On Terminal 2: time echo 2621440 > /sys/kernel/debug/cma/hugetlb2/alloc > On Terminal 3: time echo 2621440 > /sys/kernel/debug/cma/hugetlb3/alloc > > We attempt to allocate pages through the CMA debug interface and use the > time command to measure the duration of each allocation. > Performance comparison: > Without this patch With this patch > Terminal1 ~7s ~7s > Terminal2 ~14s ~8s > Terminal3 ~21s ~7s > > To slove problem above, we could use per-CMA locks to improve concurrent > allocation performance. This would allow each CMA to be managed > independently, reducing the need for a global lock and thus improving > scalability and performance. > > Signed-off-by: yangge <yangge1116@xxxxxxx> Reviewed-by: Oscar Salvador <osalvador@xxxxxxx> -- Oscar Salvador SUSE Labs