On 2024/8/28 2:16, Alexander Duyck wrote: > On Tue, Aug 27, 2024 at 5:06 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: >> >> On 2024/8/27 0:46, Alexander Duyck wrote: >>> On Mon, Aug 26, 2024 at 5:46 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: >>>> >>>> Currently there is one 'struct page_frag' for every 'struct >>>> sock' and 'struct task_struct', we are about to replace the >>>> 'struct page_frag' with 'struct page_frag_cache' for them. >>>> Before begin the replacing, we need to ensure the size of >>>> 'struct page_frag_cache' is not bigger than the size of >>>> 'struct page_frag', as there may be tens of thousands of >>>> 'struct sock' and 'struct task_struct' instances in the >>>> system. >>>> >>>> By or'ing the page order & pfmemalloc with lower bits of >>>> 'va' instead of using 'u16' or 'u32' for page size and 'u8' >>>> for pfmemalloc, we are able to avoid 3 or 5 bytes space waste. >>>> And page address & pfmemalloc & order is unchanged for the >>>> same page in the same 'page_frag_cache' instance, it makes >>>> sense to fit them together. >>>> >>>> After this patch, the size of 'struct page_frag_cache' should be >>>> the same as the size of 'struct page_frag'. >>>> >>>> CC: Alexander Duyck <alexander.duyck@xxxxxxxxx> >>>> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx> >>>> --- >>>> include/linux/mm_types_task.h | 19 ++++++----- >>>> include/linux/page_frag_cache.h | 60 +++++++++++++++++++++++++++++++-- >>>> mm/page_frag_cache.c | 51 +++++++++++++++------------- >>>> 3 files changed, 97 insertions(+), 33 deletions(-) >>>> > > ... > >>>> void page_frag_cache_drain(struct page_frag_cache *nc); >>> >>> So how many of these additions are actually needed outside of the >>> page_frag_cache.c file? I'm just wondering if we could look at moving >> >> At least page_frag_cache_is_pfmemalloc(), page_frag_encoded_page_order(), >> page_frag_encoded_page_ptr(), page_frag_encoded_page_address() are needed >> out of the page_frag_cache.c file for now, which are used mostly in >> __page_frag_cache_commit() and __page_frag_alloc_refill_probe_align() for >> debugging and performance reason, see patch 7 & 10. > > As far as the __page_frag_cache_commit I might say that could be moved > to page_frag_cache.c, but admittedly I don't know how much that would > impact the performance. The performance impact seems large enough that it does not seem to justify the moving to page_frag_cache.c, Before the moving: Performance counter stats for 'insmod page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=256 nr_test=512000000 test_align=0 test_prepare=0' (20 runs): 17.749582 task-clock (msec) # 0.002 CPUs utilized ( +- 0.15% ) 5 context-switches # 0.304 K/sec ( +- 2.48% ) 0 cpu-migrations # 0.017 K/sec ( +- 35.04% ) 76 page-faults # 0.004 M/sec ( +- 0.45% ) 46103462 cycles # 2.597 GHz ( +- 0.14% ) 60692196 instructions # 1.32 insn per cycle ( +- 0.12% ) 14734050 branches # 830.107 M/sec ( +- 0.12% ) 19792 branch-misses # 0.13% of all branches ( +- 0.75% ) 9.837758611 seconds time elapsed ( +- 0.38% ) After the moving: Performance counter stats for 'insmod page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=256 nr_test=512000000 test_align=0 test_prepare=0' (20 runs): 19.682296 task-clock (msec) # 0.002 CPUs utilized ( +- 4.08% ) 6 context-switches # 0.305 K/sec ( +- 3.42% ) 0 cpu-migrations # 0.000 K/sec 76 page-faults # 0.004 M/sec ( +- 0.44% ) 51128091 cycles # 2.598 GHz ( +- 4.08% ) 58833583 instructions # 1.15 insn per cycle ( +- 4.50% ) 14260855 branches # 724.552 M/sec ( +- 4.63% ) 20120 branch-misses # 0.14% of all branches ( +- 0.92% ) 12.318770150 seconds time elapsed ( +- 0.15% ) > >> The only left one is page_frag_encode_page(), I am not sure if it makes >> much sense to move it to page_frag_cache.c while the rest of them are in >> .h file. > > I would move it. There is no point in exposing internals more than > necessary. Also since you are carrying a BUILD_BUG_ON it would make > sense to keep that internal to your implementation.