On 2024/7/2 7:27, Alexander H Duyck wrote: > On Tue, 2024-06-25 at 21:52 +0800, Yunsheng Lin wrote: >> We are above to use page_frag_alloc_*() API to not just > "about to use", not "above to use" Ack. > >> allocate memory for skb->data, but also use them to do >> the memory allocation for skb frag too. Currently the >> implementation of page_frag in mm subsystem is running >> the offset as a countdown rather than count-up value, >> there may have several advantages to that as mentioned >> in [1], but it may have some disadvantages, for example, >> it may disable skb frag coaleasing and more correct cache >> prefetching >> ... >> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c >> index 88f567ef0e29..da244851b8a4 100644 >> --- a/mm/page_frag_cache.c >> +++ b/mm/page_frag_cache.c >> @@ -72,10 +72,6 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, >> if (!page) >> return NULL; >> >> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) >> - /* if size can vary use size else just use PAGE_SIZE */ >> - size = nc->size; >> -#endif >> /* Even if we own the page, we do not use atomic_set(). >> * This would break get_page_unless_zero() users. >> */ >> @@ -84,11 +80,16 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, >> /* reset page count bias and offset to start of new frag */ >> nc->pfmemalloc = page_is_pfmemalloc(page); >> nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; >> - nc->offset = size; >> + nc->offset = 0; >> } >> >> - offset = nc->offset - fragsz; >> - if (unlikely(offset < 0)) { >> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) >> + /* if size can vary use size else just use PAGE_SIZE */ >> + size = nc->size; >> +#endif >> + >> + offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask); >> + if (unlikely(offset + fragsz > size)) { > > The fragsz check below could be moved to here. > >> page = virt_to_page(nc->va); >> >> if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) >> @@ -99,17 +100,13 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, >> goto refill; >> } >> >> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) >> - /* if size can vary use size else just use PAGE_SIZE */ >> - size = nc->size; >> -#endif >> /* OK, page count is 0, we can safely set it */ >> set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); >> >> /* reset page count bias and offset to start of new frag */ >> nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; >> - offset = size - fragsz; >> - if (unlikely(offset < 0)) { >> + offset = 0; >> + if (unlikely(fragsz > PAGE_SIZE)) { > > Since we aren't taking advantage of the flag that is left after the > subtraction we might just want to look at moving this piece up to just > after the offset + fragsz check. That should prevent us from trying to > refill if we have a request that is larger than a single page. In > addition we could probably just drop the 3 PAGE_SIZE checks above as > they would be redundant. I am not sure I understand the 'drop the 3 PAGE_SIZE checks' part and the 'redundant' part, where is the '3 PAGE_SIZE checks'? And why they are redundant? > >> /* >> * The caller is trying to allocate a fragment >> * with fragsz > PAGE_SIZE but the cache isn't big >> @@ -124,8 +121,7 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, >> } >> >> nc->pagecnt_bias--; >> - offset &= align_mask; >> - nc->offset = offset; >> + nc->offset = offset + fragsz; >> >> return nc->va + offset; >> } > > > . >