On Mon, 20 Aug 2012, Mel Gorman wrote: > On Sun, Aug 19, 2012 at 11:49:31AM -0700, Sage Weil wrote: > > I've bisected and identified this commit: > > > > netvm: propagate page->pfmemalloc to skb > > > > The skb->pfmemalloc flag gets set to true iff during the slab allocation > > of data in __alloc_skb that the the PFMEMALLOC reserves were used. If the > > packet is fragmented, it is possible that pages will be allocated from the > > PFMEMALLOC reserve without propagating this information to the skb. This > > patch propagates page->pfmemalloc from pages allocated for fragments to > > the skb. > > > > Signed-off-by: Mel Gorman <mgorman@xxxxxxx> > > Acked-by: David S. Miller <davem@xxxxxxxxxxxxx> > > Cc: Neil Brown <neilb@xxxxxxx> > > Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> > > Cc: Mike Christie <michaelc@xxxxxxxxxxx> > > Cc: Eric B Munson <emunson@xxxxxxxxx> > > Cc: Eric Dumazet <eric.dumazet@xxxxxxxxx> > > Cc: Sebastian Andrzej Siewior <sebastian@xxxxxxxxxxxxx> > > Cc: Mel Gorman <mgorman@xxxxxxx> > > Cc: Christoph Lameter <cl@xxxxxxxxx> > > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > > > > Ok, thanks. > > > I've retested several times and confirmed that this change leads to the > > breakage, and also confirmed that reverting it on top of -rc1 also fixes > > the problem. > > > > I've also added some additional instrumentation to my code and confirmed > > that the process is blocking on poll(2) while netstat is reporting > > data available on the socket. > > > > What can I do to help track this down? > > > > Can the following patch be tested please? It is reported to fix an fio > regression that may be similar to what you are experiencing but has not > been picked up yet. This patch appears to resolve things for me as well, at least after a couple of passes. I'll let you know if I see any further problems come up with more testing. Thanks! sage > > ---8<--- > From: Alex Shi <alex.shi@xxxxxxxxx> > Subject: [PATCH] mm: correct page->pfmemalloc to fix deactivate_slab regression > > commit cfd19c5a9ec (mm: only set page->pfmemalloc when > ALLOC_NO_WATERMARKS was used) try to narrow down page->pfmemalloc > setting, but it missed some places the pfmemalloc should be set. > > So, in __slab_alloc, the unalignment pfmemalloc and ALLOC_NO_WATERMARKS > cause incorrect deactivate_slab() on our core2 server: > > 64.73% fio [kernel.kallsyms] [k] _raw_spin_lock > | > --- _raw_spin_lock > | > |---0.34%-- deactivate_slab > | __slab_alloc > | kmem_cache_alloc > | | > > That causes our fio sync write performance has 40% regression. > > This patch move the checking in get_page_from_freelist, that resolved > this issue. > > Signed-off-by: Alex Shi <alex.shi@xxxxxxxxx> > --- > mm/page_alloc.c | 21 +++++++++++---------- > 1 files changed, 11 insertions(+), 10 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 009ac28..07f1924 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1928,6 +1928,17 @@ this_zone_full: > zlc_active = 0; > goto zonelist_scan; > } > + > + if (page) > + /* > + * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was > + * necessary to allocate the page. The expectation is > + * that the caller is taking steps that will free more > + * memory. The caller should avoid the page being used > + * for !PFMEMALLOC purposes. > + */ > + page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS); > + > return page; > } > > @@ -2389,14 +2400,6 @@ rebalance: > zonelist, high_zoneidx, nodemask, > preferred_zone, migratetype); > if (page) { > - /* > - * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was > - * necessary to allocate the page. The expectation is > - * that the caller is taking steps that will free more > - * memory. The caller should avoid the page being used > - * for !PFMEMALLOC purposes. > - */ > - page->pfmemalloc = true; > goto got_pg; > } > } > @@ -2569,8 +2572,6 @@ retry_cpuset: > page = __alloc_pages_slowpath(gfp_mask, order, > zonelist, high_zoneidx, nodemask, > preferred_zone, migratetype); > - else > - page->pfmemalloc = false; > > trace_mm_page_alloc(page, order, gfp_mask, migratetype); > > -- > 1.7.5.4 > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html