On 2024/12/11 20:52, Yunsheng Lin wrote: > It seems that bottleneck is still the freeing side that the above > result might not be as meaningful as it should be. Through 'perf top' annotating, there seems to be about 70%+ cpu usage for the atmoic operation of put_page_testzero() in page_frag_free(), it was unexpected that the atmoic operation had that much overhead:( > > As we can't use more than one cpu for the free side without some > lock using a single ptr_ring, it seems something more complicated > might need to be done in order to support more than one CPU for the > freeing side? > > Before patch 1, __page_frag_alloc_align took up to 3.62% percent of > CPU using 'perf top'. > After patch 1, __page_frag_cache_prepare() and __page_frag_cache_commit_noref() > took up to 4.67% + 1.01% = 5.68%. > Having a similar result, I am not sure if the CPU usages is able tell us > the performance degradation here as it seems to be quite large? > And using 'struct page_frag' to pass the parameter seems to cause some observable overhead as the testing is very low level, peformance seems to be negligible using the below patch to avoid passing 'struct page_frag', 3.62% and 3.27% for the cpu usages for __page_frag_alloc_align() before patch 1 and __page_frag_cache_prepare() after patch 1 respectively. The new refatcoring avoid some overhead for the old API, but might cause some overhead for the new API as it is not able to skip the virt_to_page() for refilling and reusing case, though it seems to be an unlikely case. Or any better idea how to do refatcoring for unifying the page_frag API? diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h index 41a91df82631..b83e7655654e 100644 --- a/include/linux/page_frag_cache.h +++ b/include/linux/page_frag_cache.h @@ -39,8 +39,24 @@ static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc) void page_frag_cache_drain(struct page_frag_cache *nc); void __page_frag_cache_drain(struct page *page, unsigned int count); -void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, - gfp_t gfp_mask, unsigned int align_mask); +void *__page_frag_cache_prepare(struct page_frag_cache *nc, unsigned int fragsz, + gfp_t gfp_mask, unsigned int align_mask); + +static inline void *__page_frag_alloc_align(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align_mask) +{ + void *va; + + va = __page_frag_cache_prepare(nc, fragsz, gfp_mask, align_mask); + if (likely(va)) { + va += nc->offset; + nc->offset += fragsz; + nc->pagecnt_bias--; + } + + return va; +} static inline void *page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c index 3f7a203d35c6..729309aee27a 100644 --- a/mm/page_frag_cache.c +++ b/mm/page_frag_cache.c @@ -90,9 +90,9 @@ void __page_frag_cache_drain(struct page *page, unsigned int count) } EXPORT_SYMBOL(__page_frag_cache_drain); -void *__page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask) +void *__page_frag_cache_prepare(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align_mask) { unsigned long encoded_page = nc->encoded_page; unsigned int size, offset; @@ -151,12 +151,10 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, offset = 0; } - nc->pagecnt_bias--; - nc->offset = offset + fragsz; - - return encoded_page_decode_virt(encoded_page) + offset; + nc->offset = offset; + return encoded_page_decode_virt(encoded_page); } -EXPORT_SYMBOL(__page_frag_alloc_align); +EXPORT_SYMBOL(__page_frag_cache_prepare); /* * Frees a page fragment allocated out of either a compound or order 0 page.