Update documentation about design, implementation and API usages for page_frag. Also update MAINTAINERS for page_frag. Alexander seems to be the orginal author for page_frag, we can add him to the MAINTAINERS later if we have an ack from him. CC: Alexander Duyck <alexander.duyck@xxxxxxxxx> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx> --- Documentation/mm/page_frags.rst | 115 ++++++++++++++++++---------- MAINTAINERS | 10 +++ include/linux/page_frag_cache.h | 128 ++++++++++++++++++++++++++++++++ mm/page_frag_cache.c | 51 ++++++++++--- 4 files changed, 256 insertions(+), 48 deletions(-) diff --git a/Documentation/mm/page_frags.rst b/Documentation/mm/page_frags.rst index 503ca6cdb804..77256dfb58bf 100644 --- a/Documentation/mm/page_frags.rst +++ b/Documentation/mm/page_frags.rst @@ -1,43 +1,80 @@ +.. SPDX-License-Identifier: GPL-2.0 + ============== Page fragments ============== -A page fragment is an arbitrary-length arbitrary-offset area of memory -which resides within a 0 or higher order compound page. Multiple -fragments within that page are individually refcounted, in the page's -reference counter. - -The page_frag functions, page_frag_alloc and page_frag_free, provide a -simple allocation framework for page fragments. This is used by the -network stack and network device drivers to provide a backing region of -memory for use as either an sk_buff->head, or to be used in the "frags" -portion of skb_shared_info. - -In order to make use of the page fragment APIs a backing page fragment -cache is needed. This provides a central point for the fragment allocation -and tracks allows multiple calls to make use of a cached page. The -advantage to doing this is that multiple calls to get_page can be avoided -which can be expensive at allocation time. However due to the nature of -this caching it is required that any calls to the cache be protected by -either a per-cpu limitation, or a per-cpu limitation and forcing interrupts -to be disabled when executing the fragment allocation. - -The network stack uses two separate caches per CPU to handle fragment -allocation. The netdev_alloc_cache is used by callers making use of the -netdev_alloc_frag and __netdev_alloc_skb calls. The napi_alloc_cache is -used by callers of the __napi_alloc_frag and napi_alloc_skb calls. The -main difference between these two calls is the context in which they may be -called. The "netdev" prefixed functions are usable in any context as these -functions will disable interrupts, while the "napi" prefixed functions are -only usable within the softirq context. - -Many network device drivers use a similar methodology for allocating page -fragments, but the page fragments are cached at the ring or descriptor -level. In order to enable these cases it is necessary to provide a generic -way of tearing down a page cache. For this reason __page_frag_cache_drain -was implemented. It allows for freeing multiple references from a single -page via a single call. The advantage to doing this is that it allows for -cleaning up the multiple references that were added to a page in order to -avoid calling get_page per allocation. - -Alexander Duyck, Nov 29, 2016. +.. kernel-doc:: mm/page_frag_cache.c + :doc: page_frag allocator + +Architecture overview +===================== + +.. code-block:: none + + +----------------------+ + | page_frag API caller | + +----------------------+ + ^ + | + | + | + v + +----------------------------------------------+ + | request page fragment | + +----------------------------------------------+ + ^ ^ + | | + | Cache empty or not enough | + | | + v | + +--------------------------------+ | + | refill cache with order 3 page | | + +--------------------------------+ | + ^ ^ | + | | | + | | Refill failed | + | | | Cache is enough + | | | + | v | + | +----------------------------------+ | + | | refill cache with order 0 page | | + | +----------------------------------+ | + | ^ | + | Refill succeed | | + | | Refill succeed | + | | | + v v v + +----------------------------------------------+ + | allocate fragment from cache | + +----------------------------------------------+ + +API interface +============= +As the design and implementation of page_frag API, the allocation side does not +allow concurrent calling, it is assumed that the caller must ensure there is not +concurrent alloc calling to the same page_frag_cache instance by using it's own +lock or rely on some lockless guarantee like NAPI softirq. + +Depending on different use cases, callers expecting to deal with va, page or +both va and page for them may call page_frag_alloc_va(), page_frag_alloc_pg(), +or page_frag_alloc() accordingly. + +There is also a use case that need minimum memory in order for forward +progressing, but can do better if there is more memory available. Introduce +page_frag_alloc_prepare() and page_frag_alloc_commit() related API, the caller +requests the minimum memory it need and the prepare API will return the maximum +size of the fragment returned, caller need to report back to the page_frag core +how much memory it actually use by calling commit API, or not calling the commit +API if deciding to not use any memory. + +.. kernel-doc:: include/linux/page_frag_cache.h + :identifiers: page_frag_cache_init page_frag_cache_is_pfmemalloc + page_frag_alloc_va __page_frag_alloc_va_align + page_frag_alloc_va_align page_frag_alloc_va_prepare + page_frag_alloc_va_prepare_align page_frag_alloc_pg_prepare + page_frag_alloc_prepare page_frag_alloc_commit + page_frag_alloc_commit_noref page_frag_free_va + +.. kernel-doc:: mm/page_frag_cache.c + :identifiers: page_frag_cache_drain diff --git a/MAINTAINERS b/MAINTAINERS index 4745ea94d463..2f84aba59428 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16683,6 +16683,16 @@ F: mm/page-writeback.c F: mm/readahead.c F: mm/truncate.c +PAGE FRAG +M: Yunsheng Lin <linyunsheng@xxxxxxxxxx> +L: linux-mm@xxxxxxxxx +L: netdev@xxxxxxxxxxxxxxx +S: Supported +F: Documentation/mm/page_frags.rst +F: include/linux/page_frag_cache.h +F: mm/page_frag_cache.c +F: mm/page_frag_test.c + PAGE POOL M: Jesper Dangaard Brouer <hawk@xxxxxxxxxx> M: Ilias Apalodimas <ilias.apalodimas@xxxxxxxxxx> diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h index 28185969cd2c..d8edbecdd179 100644 --- a/include/linux/page_frag_cache.h +++ b/include/linux/page_frag_cache.h @@ -31,11 +31,23 @@ struct page_frag_cache { #endif }; +/** + * page_frag_cache_init() - Init page_frag cache. + * @nc: page_frag cache from which to init + * + * Inline helper to init the page_frag cache. + */ static inline void page_frag_cache_init(struct page_frag_cache *nc) { nc->va = NULL; } +/** + * page_frag_cache_is_pfmemalloc() - Check for pfmemalloc. + * @nc: page_frag cache from which to check + * + * Used to check if the current page in page_frag cache is pfmemalloc'ed. + */ static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc) { return !!nc->pfmemalloc; @@ -46,6 +58,17 @@ void __page_frag_cache_drain(struct page *page, unsigned int count); void *page_frag_cache_refill(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask); +/** + * page_frag_alloc_va() - Alloc a page fragment. + * @nc: page_frag cache from which to allocate + * @fragsz: the requested fragment size + * @gfp_mask: the allocation gfp to use when cache need to be refilled + * + * Get a page fragment from page_frag cache. + * + * Return: + * Return va of the page fragment, otherwise return NULL. + */ static inline void *page_frag_alloc_va(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask) { @@ -63,6 +86,19 @@ static inline void *page_frag_alloc_va(struct page_frag_cache *nc, return va + offset; } +/** + * __page_frag_alloc_va_align() - Alloc a page fragment with aligning + * requirement. + * @nc: page_frag cache from which to allocate + * @fragsz: the requested fragment size + * @gfp_mask: the allocation gfp to use when cache need to be refilled + * @align: the requested aligning requirement + * + * Get a page fragment from page_frag cache with aligning requirement. + * + * Return: + * Return va of the page fragment, otherwise return NULL. + */ static inline void *__page_frag_alloc_va_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, @@ -75,6 +111,19 @@ static inline void *__page_frag_alloc_va_align(struct page_frag_cache *nc, return page_frag_alloc_va(nc, fragsz, gfp_mask); } +/** + * page_frag_alloc_va_align() - Alloc a page fragment with aligning requirement. + * @nc: page_frag cache from which to allocate + * @fragsz: the requested fragment size + * @gfp_mask: the allocation gfp to use when cache need to be refilled + * @align: the requested aligning requirement + * + * WARN_ON_ONCE() checking for align and fragsz before getting a page fragment + * from page_frag cache with aligning requirement. + * + * Return: + * Return va of the page fragment, otherwise return NULL. + */ static inline void *page_frag_alloc_va_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, @@ -86,6 +135,19 @@ static inline void *page_frag_alloc_va_align(struct page_frag_cache *nc, return __page_frag_alloc_va_align(nc, fragsz, gfp_mask, align); } +/** + * page_frag_alloc_va_prepare() - Prepare allocing a page fragment. + * @nc: page_frag cache from which to prepare + * @offset: out as the offset of the page fragment + * @size: in as the requested size, out as the available size + * @gfp_mask: the allocation gfp to use when cache need to be refilled + * + * Prepare a page fragment with minimum size of ‘size’, 'size' is also used to + * report the maximum size of the page fragment the caller can use. + * + * Return: + * Return va of the page fragment, otherwise return NULL. + */ static inline void *page_frag_alloc_va_prepare(struct page_frag_cache *nc, unsigned int *offset, unsigned int *size, @@ -108,6 +170,21 @@ static inline void *page_frag_alloc_va_prepare(struct page_frag_cache *nc, return va + *offset; } +/** + * page_frag_alloc_va_prepare_align() - Prepare allocing a page fragment with + * aligning requirement. + * @nc: page_frag cache from which to prepare + * @offset: out as the offset of the page fragment + * @size: in as the requested size, out as the available size + * @align: the requested aligning requirement + * @gfp_mask: the allocation gfp to use when cache need to be refilled + * + * Prepare an aligned page fragment with minimum size of ‘size’, 'size' is also + * used to report the maximum size of the page fragment the caller can use. + * + * Return: + * Return va of the page fragment, otherwise return NULL. + */ static inline void *page_frag_alloc_va_prepare_align(struct page_frag_cache *nc, unsigned int *offset, unsigned int *size, @@ -144,6 +221,19 @@ static inline void *__page_frag_alloc_pg_prepare(struct page_frag_cache *nc, return va; } +/** + * page_frag_alloc_pg_prepare - Prepare allocing a page fragment. + * @nc: page_frag cache from which to prepare + * @offset: out as the offset of the page fragment + * @size: in as the requested size, out as the available size + * @gfp: the allocation gfp to use when cache need to be refilled + * + * Prepare a page fragment with minimum size of ‘size’, 'size' is also used to + * report the maximum size of the page fragment the caller can use. + * + * Return: + * Return the page fragment, otherwise return NULL. + */ #define page_frag_alloc_pg_prepare(nc, offset, size, gfp) \ ({ \ struct page *__page = NULL; \ @@ -179,6 +269,21 @@ static inline void *__page_frag_alloc_prepare(struct page_frag_cache *nc, return nc_va; } +/** + * page_frag_alloc_prepare - Prepare allocing a page fragment. + * @nc: page_frag cache from which to prepare + * @offset: out as the offset of the page fragment + * @size: in as the requested size, out as the available size + * @va: out as the va of the returned page fragment + * @gfp: the allocation gfp to use when cache need to be refilled + * + * Prepare a page fragment with minimum size of ‘size’, 'size' is also used to + * report the maximum size of the page fragment. Return both 'page' and 'va' of + * the fragment to the caller. + * + * Return: + * Return the page fragment, otherwise return NULL. + */ #define page_frag_alloc_prepare(nc, offset, size, va, gfp) \ ({ \ struct page *__page = NULL; \ @@ -191,6 +296,14 @@ static inline void *__page_frag_alloc_prepare(struct page_frag_cache *nc, __page; \ }) +/** + * page_frag_alloc_commit - Commit allocing a page fragment. + * @nc: page_frag cache from which to commit + * @offset: offset of the page fragment + * @size: size of the page fragment has been used + * + * Commit the alloc preparing by passing offset and the actual used size. + */ static inline void page_frag_alloc_commit(struct page_frag_cache *nc, unsigned int offset, unsigned int size) @@ -199,6 +312,17 @@ static inline void page_frag_alloc_commit(struct page_frag_cache *nc, nc->offset = offset + size; } +/** + * page_frag_alloc_commit_noref - Commit allocing a page fragment without taking + * page refcount. + * @nc: page_frag cache from which to commit + * @offset: offset of the page fragment + * @size: size of the page fragment has been used + * + * Commit the alloc preparing by passing offset and the actual used size, but + * not taking page refcount. Mostly used for fragmemt coaleasing case when the + * current fragmemt can share the same refcount with previous fragmemt. + */ static inline void page_frag_alloc_commit_noref(struct page_frag_cache *nc, unsigned int offset, unsigned int size) @@ -206,6 +330,10 @@ static inline void page_frag_alloc_commit_noref(struct page_frag_cache *nc, nc->offset = offset + size; } +/** + * page_frag_free_va - Free a page fragment by va. + * @addr: va of page fragment to be freed + */ void page_frag_free_va(void *addr); #endif diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c index cbd0ed82a596..0c76ec006c22 100644 --- a/mm/page_frag_cache.c +++ b/mm/page_frag_cache.c @@ -1,15 +1,44 @@ // SPDX-License-Identifier: GPL-2.0-only -/* Page fragment allocator + +/** + * DOC: page_frag allocator + * + * A page fragment is an arbitrary-length arbitrary-offset area of memory which + * resides within a 0 or higher order compound page. Multiple fragments within + * that page are individually refcounted, in the page's reference counter. + * + * The page_frag functions, page_frag_alloc* and page_frag_free*, provide a + * simple allocation framework for page fragments. This is used by the network + * stack and network device drivers to provide a backing region of memory for + * use as either an sk_buff->head, or to be used in the "frags" portion of + * skb_shared_info. * - * Page Fragment: - * An arbitrary-length arbitrary-offset area of memory which resides within a - * 0 or higher order page. Multiple fragments within that page are - * individually refcounted, in the page's reference counter. + * In order to make use of the page fragment APIs a backing page fragment cache + * is needed. This provides a central point for the fragment allocation and + * tracks allows multiple calls to make use of a cached page. The advantage to + * doing this is that multiple calls to get_page can be avoided which can be + * expensive at allocation time. However due to the nature of this caching it + * is required that any calls to the cache be protected by either a per-cpu + * limitation, or a per-cpu limitation and forcing interrupts to be disabled + * when executing the fragment allocation. * - * The page_frag functions provide a simple allocation framework for page - * fragments. This is used by the network stack and network device drivers to - * provide a backing region of memory for use as either an sk_buff->head, or to - * be used in the "frags" portion of skb_shared_info. + * The network stack uses two separate caches per CPU to handle fragment + * allocation. The netdev_alloc_cache is used by callers making use of the + * netdev_alloc_frag and __netdev_alloc_skb calls. The napi_alloc_cache is + * used by callers of the __napi_alloc_frag and napi_alloc_skb calls. The + * main difference between these two calls is the context in which they may be + * called. The "netdev" prefixed functions are usable in any context as these + * functions will disable interrupts, while the "napi" prefixed functions are + * only usable within the softirq context. + * + * Many network device drivers use a similar methodology for allocating page + * fragments, but the page fragments are cached at the ring or descriptor + * level. In order to enable these cases it is necessary to provide a generic + * way of tearing down a page cache. For this reason __page_frag_cache_drain + * was implemented. It allows for freeing multiple references from a single + * page via a single call. The advantage to doing this is that it allows for + * cleaning up the multiple references that were added to a page in order to + * avoid calling get_page per allocation. */ #include <linux/export.h> @@ -57,6 +86,10 @@ static bool __page_frag_cache_refill(struct page_frag_cache *nc, return true; } +/** + * page_frag_cache_drain - Drain the current page from page_frag cache. + * @nc: page_frag cache from which to drain + */ void page_frag_cache_drain(struct page_frag_cache *nc) { if (!nc->va) -- 2.33.0