On Mon, 30 May 2022 21:39:02 +0800 Chen Lin wrote: > netdev_alloc_frag->page_frag_alloc may cause memory corruption in > the following process: > > 1. A netdev_alloc_frag function call need alloc 200 Bytes to build a skb. > > 2. Insufficient memory to alloc PAGE_FRAG_CACHE_MAX_ORDER(32K) in > __page_frag_cache_refill to fill frag cache, then one page(eg:4K) > is allocated, now current frag cache is 4K, alloc is success, > nc->pagecnt_bias--. > > 3. Then this 200 bytes skb in step 1 is freed, page->_refcount--. > > 4. Another netdev_alloc_frag function call need alloc 5k, page->_refcount > is equal to nc->pagecnt_bias, reset page count bias and offset to > start of new frag. page_frag_alloc will return the 4K memory for a > 5K memory request. > > 5. The caller write on the extra 1k memory which is not actual allocated > will cause memory corruption. > > page_frag_alloc is for fragmented allocation. We should warn the caller > to avoid memory corruption. > > When fragsz is larger than one page, we report the failure and return. > I don't think it is a good idea to make efforts to support the > allocation of more than one page in this function because the total > frag cache size(PAGE_FRAG_CACHE_MAX_SIZE 32768) is relatively small. > When the request is larger than one page, the caller should switch to > use other kernel interfaces, such as kmalloc and alloc_Pages. > > This bug is mainly caused by the reuse of the previously allocated > frag cache memory by the following LARGER allocations. This bug existed > before page_frag_alloc was ported from __netdev_alloc_frag in > net/core/skbuff.c, so most Linux versions have this problem. > > Signed-off-by: Chen Lin <chen45464546@xxxxxxx> > --- > mm/page_alloc.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index e008a3d..1e9e2c4 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5574,6 +5574,16 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, > struct page *page; > int offset; > > + /* frag_alloc is not suitable for memory alloc which fragsz > + * is bigger than PAGE_SIZE, use kmalloc or alloc_pages instead. > + */ > + if (unlikely(fragsz > PAGE_SIZE)) { > + WARN(1, "alloc fragsz(%d) > PAGE_SIZE(%ld) not supported, > + alloc fail\n", fragsz, PAGE_SIZE); > + > + return NULL; > + } > + > if (unlikely(!nc->va)) { > refill: > page = __page_frag_cache_refill(nc, gfp_mask); Let's see what Alex says (fixing his email now). It seems a little too drastic to me. I'd go with something like: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e008a3df0485..360a545ee5e8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5537,6 +5537,7 @@ EXPORT_SYMBOL(free_pages); * sk_buff->head, or to be used in the "frags" portion of skb_shared_info. */ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask) { struct page *page = NULL; @@ -5549,7 +5550,7 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, PAGE_FRAG_CACHE_MAX_ORDER); nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; #endif - if (unlikely(!page)) + if (unlikely(!page && fragsz <= PAGE_SIZE)) page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); nc->va = page ? page_address(page) : NULL; @@ -5576,7 +5577,7 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, if (unlikely(!nc->va)) { refill: - page = __page_frag_cache_refill(nc, gfp_mask); + page = __page_frag_cache_refill(nc, fragsz, gfp_mask); if (!page) return NULL;