Re: [PATCH] mm: move FOLL_PIN debug accounting under CONFIG_DEBUG_VM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/31/23 07:36, Jens Axboe wrote:
> Using FOLL_PIN for mapping user pages caused a performance regression of
> about 2.7%. Looking at profiles, we see:
> 
> +2.71%  [kernel.vmlinux]  [k] mod_node_page_state
> 
> which wasn't there before. The node page state counters are percpu, but
> with a very low threshold. On my setup, every 108th update ends up
> needing to punt to two atomic_lond_add()'s, which is causing this above
> regression.
> 
> As these counters are purely for debug purposes, move them under
> CONFIG_DEBUG_VM rather than do them unconditionally.
> 
> Fixes: fd20d0c1852e ("block: convert bio_map_user_iov to use iov_iter_extract_pages")
> Fixes: 920756a3306a ("block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages")
> Link: https://lore.kernel.org/linux-block/f57ee72f-38e9-6afa-182f-2794638eadcb@xxxxxxxxx/
> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> 

Yes, that's the exact right fix (in the absence of some magical
high-volume, high performance per-cpu counters anyway). In fact,
originally these counter operations were behind CONFIG_DEBUG_VM in an
early patchset, but the FOLL_PIN feature was new and potentially flaky,
and also not yet in the Direct IO fast path. So during code review we
decided to enable the debugging information unconditionally.

But now we can no longer afford the debug counters in release
builds--but we also no longer need them, because the FOLL_PIN feature
has settled in and had enough soak time to be more confident about it.

Over time, I'd kind of started assuming that these counters were
necessary for release builds, and it required someone else to wake me up
and point out the obvious, so thanks! :)

Reviewed-by: John Hubbard <jhubbard@xxxxxxxxxx>

thanks,
-- 
John Hubbard
NVIDIA

> ---
> 
> I added fixes tags, even though it's not a strict fix for this commits.
> But it does fix a performance regression introduced by those commits.
> It's a useful hint for backporting.
> 
> I'd prefer sticking this at the end of the iov-extract series that
> is already pulled in, so it can go with those patches.
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index cd28a100d9e4..0153ec8a54ae 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -195,8 +195,10 @@ enum node_stat_item {
>  	NR_WRITTEN,		/* page writings since bootup */
>  	NR_THROTTLED_WRITTEN,	/* NR_WRITTEN while reclaim throttled */
>  	NR_KERNEL_MISC_RECLAIMABLE,	/* reclaimable non-slab kernel pages */
> +#ifdef CONFIG_DEBUG_VM
>  	NR_FOLL_PIN_ACQUIRED,	/* via: pin_user_page(), gup flag: FOLL_PIN */
>  	NR_FOLL_PIN_RELEASED,	/* pages returned via unpin_user_page() */
> +#endif
>  	NR_KERNEL_STACK_KB,	/* measured in KiB */
>  #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
>  	NR_KERNEL_SCS_KB,	/* measured in KiB */
> diff --git a/mm/gup.c b/mm/gup.c
> index f45a3a5be53a..41abb16286ec 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -168,7 +168,9 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
>  		 */
>  		smp_mb__after_atomic();
>  
> +#ifdef CONFIG_DEBUG_VM
>  		node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
> +#endif
>  
>  		return folio;
>  	}
> @@ -180,7 +182,9 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
>  static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
>  {
>  	if (flags & FOLL_PIN) {
> +#ifdef CONFIG_DEBUG_VM
>  		node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
> +#endif
>  		if (folio_test_large(folio))
>  			atomic_sub(refs, folio_pincount_ptr(folio));
>  		else
> @@ -236,8 +240,9 @@ int __must_check try_grab_page(struct page *page, unsigned int flags)
>  		} else {
>  			folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
>  		}
> -
> +#ifdef CONFIG_DEBUG_VM
>  		node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
> +#endif
>  	}
>  
>  	return 0;
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 1ea6a5ce1c41..5cbd9a1924bf 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1227,8 +1227,10 @@ const char * const vmstat_text[] = {
>  	"nr_written",
>  	"nr_throttled_written",
>  	"nr_kernel_misc_reclaimable",
> +#ifdef CONFIG_DEBUG_VM
>  	"nr_foll_pin_acquired",
>  	"nr_foll_pin_released",
> +#endif
>  	"nr_kernel_stack",
>  #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
>  	"nr_shadow_call_stack",
> 






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux