On 11/3/20 1:15 PM, Matthew Wilcox wrote: > On Tue, Nov 03, 2020 at 12:57:33PM -0800, Dongli Zhang wrote: >> On 11/3/20 12:35 PM, Matthew Wilcox wrote: >>> On Tue, Nov 03, 2020 at 11:32:39AM -0800, Dongli Zhang wrote: >>>> However, once kernel is not under memory pressure any longer (suppose large >>>> amount of memory pages are just reclaimed), the page_frag_alloc() may still >>>> re-use the prior pfmemalloc page_frag_cache->va to allocate skb->data. As a >>>> result, the skb->pfmemalloc is always true unless page_frag_cache->va is >>>> re-allocated, even the kernel is not under memory pressure any longer. >>>> + /* >>>> + * Try to avoid re-using pfmemalloc page because kernel may already >>>> + * run out of the memory pressure situation at any time. >>>> + */ >>>> + if (unlikely(nc->va && nc->pfmemalloc)) { >>>> + page = virt_to_page(nc->va); >>>> + __page_frag_cache_drain(page, nc->pagecnt_bias); >>>> + nc->va = NULL; >>>> + } >>> >>> I think this is the wrong way to solve this problem. Instead, we should >>> use up this page, but refuse to recycle it. How about something like this (not even compile tested): >> >> Thank you very much for the feedback. Yes, the option is to use the same page >> until it is used up (offset < 0). Instead of recycling it, the kernel free it >> and allocate new one. >> >> This depends on whether we will tolerate the packet drop until this page is used up. >> >> For virtio-net, the payload (skb->data) is of size 128-byte. The padding and >> alignment will finally make it as 512-byte. >> >> Therefore, for virtio-net, we will have at most 4096/512-1=7 packets dropped >> before the page is used up. > > My thinking is that if the kernel is under memory pressure then freeing > the page and allocating a new one is likely to put even more strain > on the memory allocator, so we want to do this "soon", rather than at > each allocation. > > Thanks for providing the numbers. Do you think that dropping (up to) > 7 packets is acceptable?> > We could also do something like ... > > if (unlikely(nc->pfmemalloc)) { > page = alloc_page(GFP_NOWAIT | __GFP_NOWARN); > if (page) > nc->pfmemalloc = 0; > put_page(page); > } > > to test if the memory allocator has free pages at the moment. Not sure > whether that's a good idea or not -- hopefully you have a test environment > set up where you can reproduce this condition on demand and determine > which of these three approaches is best! > >From mm's perspective, we expect to reduce the number of page allocation (especially under memory pressure). >From networking's perspective, we expect to reduce the number of skb drop. That's why I CCed netdev folks (including David and Jakub), although the patch is for mm/page_alloc.c. The page_frag_alloc() is primarily used by networking and nvme-tcp. Unfortunately, so far I do not have the env to reproduce. I reproduced with a patch to fail page allocation and set nc->pfmemalloc on purpose. >From mm's perspective, I think to use up the page is a good option. Indeed tt is system administrator's duty to avoid memory pressure, in order to avoid the extra packet drops. Dongli Zhang