On Wed, 1 Feb 2023 17:25:49 +0100 Alexander Halbuer <halbuer@xxxxxxxxxxxxxxxxxxx> wrote: > The `rmqueue_bulk` function batches the allocation of multiple elements to > refill the per-CPU buffers into a single hold of the zone lock. Each > element is allocated and checked using the `check_pcp_refill` function. > The check touches every related struct page which is especially expensive > for higher order allocations (huge pages). This patch reduces the time > holding the lock by moving the check out of the critical section similar > to the `rmqueue_buddy` function which allocates a single element. > Measurements of parallel allocation-heavy workloads show a reduction of > the average huge page allocation latency of 50 percent for two cores and > nearly 90 percent for 24 cores. Sounds nice. Were you able to test how much benefit we get by simply removing the check_new_pages() call from rmqueue_bulk()? Vlastimil, I find this quite confusing: #ifdef CONFIG_DEBUG_VM /* * With DEBUG_VM enabled, order-0 pages are checked for expected state when * being allocated from pcp lists. With debug_pagealloc also enabled, they are * also checked when pcp lists are refilled from the free lists. */ static inline bool check_pcp_refill(struct page *page, unsigned int order) { if (debug_pagealloc_enabled_static()) return check_new_pages(page, order); else return false; } static inline bool check_new_pcp(struct page *page, unsigned int order) { return check_new_pages(page, order); } #else /* * With DEBUG_VM disabled, free order-0 pages are checked for expected state * when pcp lists are being refilled from the free lists. With debug_pagealloc * enabled, they are also checked when being allocated from the pcp lists. */ static inline bool check_pcp_refill(struct page *page, unsigned int order) { return check_new_pages(page, order); } static inline bool check_new_pcp(struct page *page, unsigned int order) { if (debug_pagealloc_enabled_static()) return check_new_pages(page, order); else return false; } #endif /* CONFIG_DEBUG_VM */ and the 4462b32c9285b5 changelog is a struggle to follow. Why are we performing *any* checks when CONFIG_DEBUG_VM=n and when debug_pagealloc_enabled is false? Anyway, these checks sounds quite costly so let's revisit their desirability?