Re: regression with poll(2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 20 Aug 2012, Mel Gorman wrote:
> On Sun, Aug 19, 2012 at 11:49:31AM -0700, Sage Weil wrote:
> > I've bisected and identified this commit:
> > 
> >     netvm: propagate page->pfmemalloc to skb
> >     
> >     The skb->pfmemalloc flag gets set to true iff during the slab allocation
> >     of data in __alloc_skb that the the PFMEMALLOC reserves were used.  If the
> >     packet is fragmented, it is possible that pages will be allocated from the
> >     PFMEMALLOC reserve without propagating this information to the skb.  This
> >     patch propagates page->pfmemalloc from pages allocated for fragments to
> >     the skb.
> >     
> >     Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
> >     Acked-by: David S. Miller <davem@xxxxxxxxxxxxx>
> >     Cc: Neil Brown <neilb@xxxxxxx>
> >     Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> >     Cc: Mike Christie <michaelc@xxxxxxxxxxx>
> >     Cc: Eric B Munson <emunson@xxxxxxxxx>
> >     Cc: Eric Dumazet <eric.dumazet@xxxxxxxxx>
> >     Cc: Sebastian Andrzej Siewior <sebastian@xxxxxxxxxxxxx>
> >     Cc: Mel Gorman <mgorman@xxxxxxx>
> >     Cc: Christoph Lameter <cl@xxxxxxxxx>
> >     Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> >     Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > 
> 
> Ok, thanks.
> 
> > I've retested several times and confirmed that this change leads to the 
> > breakage, and also confirmed that reverting it on top of -rc1 also fixes 
> > the problem.
> > 
> > I've also added some additional instrumentation to my code and confirmed 
> > that the process is blocking on poll(2) while netstat is reporting 
> > data available on the socket.
> > 
> > What can I do to help track this down?
> > 
> 
> Can the following patch be tested please? It is reported to fix an fio
> regression that may be similar to what you are experiencing but has not
> been picked up yet.

This patch appears to resolve things for me as well, at least after a 
couple of passes.  I'll let you know if I see any further problems come up 
with more testing.

Thanks!
sage


> 
> ---8<---
> From: Alex Shi <alex.shi@xxxxxxxxx>
> Subject: [PATCH] mm: correct page->pfmemalloc to fix deactivate_slab regression
> 
> commit cfd19c5a9ec (mm: only set page->pfmemalloc when
> ALLOC_NO_WATERMARKS was used) try to narrow down page->pfmemalloc
> setting, but it missed some places the pfmemalloc should be set.
> 
> So, in __slab_alloc, the unalignment pfmemalloc and ALLOC_NO_WATERMARKS
> cause incorrect deactivate_slab() on our core2 server:
> 
>     64.73%           fio  [kernel.kallsyms]     [k] _raw_spin_lock
>                      |
>                      --- _raw_spin_lock
>                         |
>                         |---0.34%-- deactivate_slab
>                         |          __slab_alloc
>                         |          kmem_cache_alloc
>                         |          |
> 
> That causes our fio sync write performance has 40% regression.
> 
> This patch move the checking in get_page_from_freelist, that resolved
> this issue.
> 
> Signed-off-by: Alex Shi <alex.shi@xxxxxxxxx>
> ---
>  mm/page_alloc.c |   21 +++++++++++----------
>  1 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 009ac28..07f1924 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1928,6 +1928,17 @@ this_zone_full:
>  		zlc_active = 0;
>  		goto zonelist_scan;
>  	}
> +
> +	if (page)
> +		/*
> +		 * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was
> +		 * necessary to allocate the page. The expectation is
> +		 * that the caller is taking steps that will free more
> +		 * memory. The caller should avoid the page being used
> +		 * for !PFMEMALLOC purposes.
> +		 */
> +		page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS);
> +
>  	return page;
>  }
>  
> @@ -2389,14 +2400,6 @@ rebalance:
>  				zonelist, high_zoneidx, nodemask,
>  				preferred_zone, migratetype);
>  		if (page) {
> -			/*
> -			 * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was
> -			 * necessary to allocate the page. The expectation is
> -			 * that the caller is taking steps that will free more
> -			 * memory. The caller should avoid the page being used
> -			 * for !PFMEMALLOC purposes.
> -			 */
> -			page->pfmemalloc = true;
>  			goto got_pg;
>  		}
>  	}
> @@ -2569,8 +2572,6 @@ retry_cpuset:
>  		page = __alloc_pages_slowpath(gfp_mask, order,
>  				zonelist, high_zoneidx, nodemask,
>  				preferred_zone, migratetype);
> -	else
> -		page->pfmemalloc = false;
>  
>  	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
>  
> -- 
> 1.7.5.4
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux