On 5/8/24 6:34 AM, Yunsheng Lin wrote: > Update documentation about design, implementation and API usages > for page_frag. > > CC: Alexander Duyck <alexander.duyck@xxxxxxxxx> > Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx> > --- > Documentation/mm/page_frags.rst | 156 +++++++++++++++++++++++++++++++- > include/linux/page_frag_cache.h | 96 ++++++++++++++++++++ > mm/page_frag_cache.c | 65 ++++++++++++- > 3 files changed, 314 insertions(+), 3 deletions(-) > > diff --git a/Documentation/mm/page_frags.rst b/Documentation/mm/page_frags.rst > index 503ca6cdb804..9c25c0fd81f0 100644 > --- a/Documentation/mm/page_frags.rst > +++ b/Documentation/mm/page_frags.rst > @@ -1,3 +1,5 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > ============== > Page fragments > ============== > @@ -40,4 +42,156 @@ page via a single call. The advantage to doing this is that it allows for > cleaning up the multiple references that were added to a page in order to > avoid calling get_page per allocation. > > -Alexander Duyck, Nov 29, 2016. > + > +Architecture overview > +===================== > + > +.. code-block:: none > + > + +----------------------+ > + | page_frag API caller | > + +----------------------+ > + ^ > + | > + | > + | > + v > + +------------------------------------------------+ > + | request page fragment | > + +------------------------------------------------+ > + ^ ^ ^ > + | | Cache not enough | > + | Cache empty v | > + | +-----------------+ | > + | | drain old cache | | > + | +-----------------+ | > + | ^ | > + | | | > + v v | > + +----------------------------------+ | > + | refill cache with order 3 page | | > + +----------------------------------+ | > + ^ ^ | > + | | | > + | | Refill failed | > + | | | Cache is enough > + | | | > + | v | > + | +----------------------------------+ | > + | | refill cache with order 0 page | | > + | +----------------------------------+ | > + | ^ | > + | Refill succeed | | > + | | Refill succeed | > + | | | > + v v v > + +------------------------------------------------+ > + | allocate fragment from cache | > + +------------------------------------------------+ > + > +API interface > +============= > +As the design and implementation of page_frag API implies, the allocation side > +does not allow concurrent calling. Instead it is assumed that the caller must > +ensure there is not concurrent alloc calling to the same page_frag_cache > +instance by using its own lock or rely on some lockless guarantee like NAPI > +softirq. > + > +Depending on different aligning requirement, the page_frag API caller may call > +page_frag_alloc*_align*() to ensure the returned virtual address or offset of > +the page is aligned according to the 'align/alignment' parameter. Note the size > +of the allocated fragment is not aligned, the caller need to provide a aligned needs to provide an aligned > +fragsz if there is a alignment requirement for the size of the fragment. an alignment > + > +Depending on different use cases, callers expecting to deal with va, page or > +both va and page for them may call page_frag_alloc_va*, page_frag_alloc_pg*, > +or page_frag_alloc* API accordingly. > + > +There is also a use case that need minimum memory in order for forward needs > +progressing, but more performant if more memory is available. Using progress, > +page_frag_alloc_prepare() and page_frag_alloc_commit() related API, the caller > +requests the minimum memory it need and the prepare API will return the maximum needs > +size of the fragment returned, the caller needs to either call the commit API to returned. The caller > +report how much memory it actually uses, or not do so if deciding to not use any > +memory. > + > +.. kernel-doc:: include/linux/page_frag_cache.h > + :identifiers: page_frag_cache_init page_frag_cache_is_pfmemalloc > + page_frag_cache_page_offset page_frag_alloc_va > + page_frag_alloc_va_align page_frag_alloc_va_prepare_align > + page_frag_alloc_probe page_frag_alloc_commit > + page_frag_alloc_commit_noref > + > +.. kernel-doc:: mm/page_frag_cache.c > + :identifiers: __page_frag_alloc_va_align page_frag_alloc_va_prepare > + page_frag_alloc_pg_prepare page_frag_alloc_prepare > + page_frag_cache_drain page_frag_free_va > + > +Coding examples > +=============== > + > +Init & Drain API > +---------------- > + > +.. code-block:: c > + > + page_frag_cache_init(pfrag); > + ... > + page_frag_cache_drain(pfrag); > + > + > +Alloc & Free API > +---------------- > + > +.. code-block:: c > + > + void *va; > + > + va = page_frag_alloc_va_align(pfrag, size, gfp, align); > + if (!va) > + goto do_error; > + > + err = do_something(va, size); > + if (err) { > + page_frag_free_va(va); > + goto do_error; > + } > + > +Prepare & Commit API > +-------------------- > + > +.. code-block:: c > + > + unsigned int offset, size; > + bool merge = true; > + struct page *page; > + void *va; > + > + size = 32U; > + page = page_frag_alloc_prepare(pfrag, &offset, &size, &va); > + if (!page) > + goto wait_for_space; > + > + copy = min_t(int, copy, size); declare copy? > + if (!skb_can_coalesce(skb, i, page, offset)) { > + if (i >= max_skb_frags) > + goto new_segment; > + > + merge = false; > + } > + > + copy = mem_schedule(copy); > + if (!copy) > + goto wait_for_space; > + > + err = copy_from_iter_full_nocache(va, copy, iter); > + if (err) > + goto do_error; > + > + if (merge) { > + skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); > + page_frag_alloc_commit_noref(pfrag, offset, copy); > + } else { > + skb_fill_page_desc(skb, i, page, offset, copy); > + page_frag_alloc_commit(pfrag, offset, copy); > + } > diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h > index 30893638155b..8925397262a1 100644 > --- a/include/linux/page_frag_cache.h > +++ b/include/linux/page_frag_cache.h > @@ -61,11 +61,28 @@ struct page_frag_cache { > #endif > }; > > +/** > + * page_frag_cache_init() - Init page_frag cache. > + * @nc: page_frag cache from which to init > + * > + * Inline helper to init the page_frag cache. > + */ > static inline void page_frag_cache_init(struct page_frag_cache *nc) > { > memset(nc, 0, sizeof(*nc)); > } > > +/** > + * page_frag_cache_is_pfmemalloc() - Check for pfmemalloc. > + * @nc: page_frag cache from which to check > + * > + * Used to check if the current page in page_frag cache is pfmemalloc'ed. > + * It has the same calling context expection as the alloc API. > + * > + * Return: > + * Return true if the current page in page_frag cache is pfmemalloc'ed, Drop the (second) word "Return"... > + * otherwise return false. > + */ > static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc) > { > return encoded_page_pfmemalloc(nc->encoded_va); > @@ -92,6 +109,19 @@ void *__page_frag_alloc_va_align(struct page_frag_cache *nc, > unsigned int fragsz, gfp_t gfp_mask, > unsigned int align_mask); > > +/** > + * page_frag_alloc_va_align() - Alloc a page fragment with aligning requirement. > + * @nc: page_frag cache from which to allocate > + * @fragsz: the requested fragment size > + * @gfp_mask: the allocation gfp to use when cache need to be refilled needs > + * @align: the requested aligning requirement for 'va' or @va > + * > + * WARN_ON_ONCE() checking for 'align' before allocing a page fragment from > + * page_frag cache with aligning requirement for 'va'. or @va. > + * > + * Return: > + * Return va of the page fragment, otherwise return NULL. Drop the second "Return". > + */ > static inline void *page_frag_alloc_va_align(struct page_frag_cache *nc, > unsigned int fragsz, > gfp_t gfp_mask, unsigned int align) > @@ -100,11 +130,32 @@ static inline void *page_frag_alloc_va_align(struct page_frag_cache *nc, > return __page_frag_alloc_va_align(nc, fragsz, gfp_mask, -align); > } > > +/** > + * page_frag_cache_page_offset() - Return the current page fragment's offset. > + * @nc: page_frag cache from which to check > + * > + * The API is only used in net/sched/em_meta.c for historical reason, do not use reasons; do not use > + * it for new caller unless there is a strong reason. callers > + * > + * Return: > + * Return the offset of the current page fragment in the page_frag cache. Drop second "Return". > + */ > static inline unsigned int page_frag_cache_page_offset(const struct page_frag_cache *nc) > { > return __page_frag_cache_page_offset(nc->encoded_va, nc->remaining); > } > > +/** > + * page_frag_alloc_va() - Alloc a page fragment. > + * @nc: page_frag cache from which to allocate > + * @fragsz: the requested fragment size > + * @gfp_mask: the allocation gfp to use when cache need to be refilled needs > + * > + * Get a page fragment from page_frag cache. > + * > + * Return: > + * Return va of the page fragment, otherwise return NULL. Drop second "Return". > + */ > static inline void *page_frag_alloc_va(struct page_frag_cache *nc, > unsigned int fragsz, gfp_t gfp_mask) > { > @@ -114,6 +165,21 @@ static inline void *page_frag_alloc_va(struct page_frag_cache *nc, > void *page_frag_alloc_va_prepare(struct page_frag_cache *nc, unsigned int *fragsz, > gfp_t gfp); > > +/** > + * page_frag_alloc_va_prepare_align() - Prepare allocing a page fragment with > + * aligning requirement. > + * @nc: page_frag cache from which to prepare > + * @fragsz: in as the requested size, out as the available size > + * @gfp: the allocation gfp to use when cache need to be refilled needs > + * @align: the requested aligning requirement for 'va' or @va > + * > + * WARN_ON_ONCE() checking for 'align' before preparing an aligned page fragment > + * with minimum size of ‘fragsz’, 'fragsz' is also used to report the maximum 'fragsz'. 'fragsz' is (don't use fancy single quote marks above) > + * size of the page fragment the caller can use. > + * > + * Return: > + * Return va of the page fragment, otherwise return NULL. Drop second "Return". > + */ > static inline void *page_frag_alloc_va_prepare_align(struct page_frag_cache *nc, > unsigned int *fragsz, > gfp_t gfp, > @@ -148,6 +214,19 @@ static inline struct encoded_va *__page_frag_alloc_probe(struct page_frag_cache > return encoded_va; > } > > +/** > + * page_frag_alloc_probe - Probe the avaiable page fragment. available > + * @nc: page_frag cache from which to probe > + * @offset: out as the offset of the page fragment > + * @fragsz: in as the requested size, out as the available size > + * @va: out as the virtual address of the returned page fragment > + * > + * Probe the current available memory to caller without doing cache refilling. > + * If the cache is empty, return NULL. > + * > + * Return: > + * Return the page fragment, otherwise return NULL. Drop the second "Return". > + */ > #define page_frag_alloc_probe(nc, offset, fragsz, va) \ > ({ \ > struct encoded_va *__encoded_va; \ > @@ -162,6 +241,13 @@ static inline struct encoded_va *__page_frag_alloc_probe(struct page_frag_cache > __page; \ > }) > > +/** > + * page_frag_alloc_commit - Commit allocing a page fragment. > + * @nc: page_frag cache from which to commit > + * @fragsz: size of the page fragment has been used > + * > + * Commit the alloc preparing by passing the actual used size. > + */ > static inline void page_frag_alloc_commit(struct page_frag_cache *nc, > unsigned int fragsz) > { > @@ -170,6 +256,16 @@ static inline void page_frag_alloc_commit(struct page_frag_cache *nc, > nc->remaining -= fragsz; > } > > +/** > + * page_frag_alloc_commit_noref - Commit allocing a page fragment without taking > + * page refcount. > + * @nc: page_frag cache from which to commit > + * @fragsz: size of the page fragment has been used > + * > + * Commit the alloc preparing by passing the actual used size, but not taking > + * page refcount. Mostly used for fragmemt coaleasing case when the current fragment coalescing > + * fragmemt can share the same refcount with previous fragmemt. fragment fragment. > + */ > static inline void page_frag_alloc_commit_noref(struct page_frag_cache *nc, > unsigned int fragsz) > { > diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c > index eb8bf59b26bb..85e23d5cbdcc 100644 > --- a/mm/page_frag_cache.c > +++ b/mm/page_frag_cache.c > @@ -89,6 +89,18 @@ static struct page *page_frag_cache_refill(struct page_frag_cache *nc, > return __page_frag_cache_refill(nc, gfp_mask); > } > > +/** > + * page_frag_alloc_va_prepare() - Prepare allocing a page fragment. > + * @nc: page_frag cache from which to prepare > + * @fragsz: in as the requested size, out as the available size > + * @gfp: the allocation gfp to use when cache need to be refilled needs > + * > + * Prepare a page fragment with minimum size of ‘fragsz’, 'fragsz' is also used 'fragsz'. 'fragsz' (don't use fancy single quote marks) > + * to report the maximum size of the page fragment the caller can use. > + * > + * Return: > + * Return va of the page fragment, otherwise return NULL. Drop second "Return". > + */ > void *page_frag_alloc_va_prepare(struct page_frag_cache *nc, > unsigned int *fragsz, gfp_t gfp) > { > @@ -111,6 +123,19 @@ void *page_frag_alloc_va_prepare(struct page_frag_cache *nc, > } > EXPORT_SYMBOL(page_frag_alloc_va_prepare); > > +/** > + * page_frag_alloc_pg_prepare - Prepare allocing a page fragment. > + * @nc: page_frag cache from which to prepare > + * @offset: out as the offset of the page fragment > + * @fragsz: in as the requested size, out as the available size > + * @gfp: the allocation gfp to use when cache need to be refilled > + * > + * Prepare a page fragment with minimum size of ‘fragsz’, 'fragsz' is also used 'fragsz'. 'fragsz' (don't use fancy single quote marks) > + * to report the maximum size of the page fragment the caller can use. > + * > + * Return: > + * Return the page fragment, otherwise return NULL. Drop second "Return". > + */ > struct page *page_frag_alloc_pg_prepare(struct page_frag_cache *nc, > unsigned int *offset, > unsigned int *fragsz, gfp_t gfp) > @@ -141,6 +166,21 @@ struct page *page_frag_alloc_pg_prepare(struct page_frag_cache *nc, > } > EXPORT_SYMBOL(page_frag_alloc_pg_prepare); > > +/** > + * page_frag_alloc_prepare - Prepare allocing a page fragment. > + * @nc: page_frag cache from which to prepare > + * @offset: out as the offset of the page fragment > + * @fragsz: in as the requested size, out as the available size > + * @va: out as the virtual address of the returned page fragment > + * @gfp: the allocation gfp to use when cache need to be refilled > + * > + * Prepare a page fragment with minimum size of ‘fragsz’, 'fragsz' is also used 'fragsz'. 'fragsz' (don't use fancy single quote marks) You could also (in several places) refer to the variables as @fragsz. @fragsz > + * to report the maximum size of the page fragment. Return both 'page' and 'va' > + * of the fragment to the caller. > + * > + * Return: > + * Return the page fragment, otherwise return NULL. Drop second "Return". But the paragraph above says that both @page and @va are returned. How is that done? > + */ > struct page *page_frag_alloc_prepare(struct page_frag_cache *nc, > unsigned int *offset, > unsigned int *fragsz, > @@ -173,6 +213,10 @@ struct page *page_frag_alloc_prepare(struct page_frag_cache *nc, > } > EXPORT_SYMBOL(page_frag_alloc_prepare); > > +/** > + * page_frag_cache_drain - Drain the current page from page_frag cache. > + * @nc: page_frag cache from which to drain > + */ > void page_frag_cache_drain(struct page_frag_cache *nc) > { > if (!nc->encoded_va) > @@ -193,6 +237,19 @@ void __page_frag_cache_drain(struct page *page, unsigned int count) > } > EXPORT_SYMBOL(__page_frag_cache_drain); > > +/** > + * __page_frag_alloc_va_align() - Alloc a page fragment with aligning > + * requirement. > + * @nc: page_frag cache from which to allocate > + * @fragsz: the requested fragment size > + * @gfp_mask: the allocation gfp to use when cache need to be refilled > + * @align_mask: the requested aligning requirement for the 'va' > + * > + * Get a page fragment from page_frag cache with aligning requirement. > + * > + * Return: > + * Return va of the page fragment, otherwise return NULL. Drop the second "Return". > + */ > void *__page_frag_alloc_va_align(struct page_frag_cache *nc, > unsigned int fragsz, gfp_t gfp_mask, > unsigned int align_mask) > @@ -263,8 +320,12 @@ void *__page_frag_alloc_va_align(struct page_frag_cache *nc, > } > EXPORT_SYMBOL(__page_frag_alloc_va_align); > > -/* > - * Frees a page fragment allocated out of either a compound or order 0 page. > +/** > + * page_frag_free_va - Free a page fragment. > + * @addr: va of page fragment to be freed > + * > + * Free a page fragment allocated out of either a compound or order 0 page by > + * virtual address. > */ > void page_frag_free_va(void *addr) > { thanks. -- #Randy https://people.kernel.org/tglx/notes-about-netiquette https://subspace.kernel.org/etiquette.html