Re: [RFC][PATCH 01/13] mm/vmscan: restore zone_reclaim_mode ABI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 25, 2021 at 04:34:13PM -0800, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> 
> I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
> sysctl.  Like a good kernel developer, I also went to go update the
> documentation.  I noticed that the bits in the documentation didn't
> match the bits in the #defines.
> 
> The VM never explicitly checks the RECLAIM_ZONE bit.  The bit is,
> however implicitly checked when checking 'node_reclaim_mode==0'.
> The RECLAIM_ZONE #define was removed in a cleanup.  That, by itself
> is fine.
> 
> But, when the bit was removed (bit 0) the _other_ bit locations also
> got changed.  That's not OK because the bit values are documented to
> mean one specific thing and users surely rely on them meaning that one
> thing and not changing from kernel to kernel.  The end result is that
> if someone had a script that did:
> 
> 	sysctl vm.zone_reclaim_mode=1
> 
> This script would have gone from enalbing node reclaim for clean
> unmapped pages to writing out pages during node reclaim after the
> commit in question.  That's not great.
> 
> Put the bits back the way they were and add a comment so something
> like this is a bit harder to do again.  Update the documentation to
> make it clear that the first bit is ignored.
> 
> Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> Fixes: 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
> Reviewed-by: Ben Widawsky <ben.widawsky@xxxxxxxxx>
> Acked-by: David Rientjes <rientjes@xxxxxxxxxx>
> Acked-by: Christoph Lameter <cl@xxxxxxxxx>
> Cc: Alex Shi <alex.shi@xxxxxxxxxxxxxxxxx>
> Cc: Daniel Wagner <dwagner@xxxxxxx>
> Cc: "Tobin C. Harding" <tobin@xxxxxxxxxx>
> Cc: Christoph Lameter <cl@xxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Huang Ying <ying.huang@xxxxxxxxx>
> Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> Cc: Qian Cai <cai@xxxxxx>
> Cc: Daniel Wagner <dwagner@xxxxxxx>
> Cc: osalvador <osalvador@xxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx

Reviewed-by: Oscar Salvador <osalvador@xxxxxxx>

> 
> --
> 
> Changes from v2:
>  * Update description to indicate that bit0 was used for clean
>    unmapped page node reclaim.
> ---
> 
>  b/Documentation/admin-guide/sysctl/vm.rst |   10 +++++-----
>  b/mm/vmscan.c                             |    9 +++++++--
>  2 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff -puN Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi Documentation/admin-guide/sysctl/vm.rst
> --- a/Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi	2021-01-25 16:23:06.048866718 -0800
> +++ b/Documentation/admin-guide/sysctl/vm.rst	2021-01-25 16:23:06.056866718 -0800
> @@ -978,11 +978,11 @@ that benefit from having their data cach
>  left disabled as the caching effect is likely to be more important than
>  data locality.
>  
> -zone_reclaim may be enabled if it's known that the workload is partitioned
> -such that each partition fits within a NUMA node and that accessing remote
> -memory would cause a measurable performance reduction.  The page allocator
> -will then reclaim easily reusable pages (those page cache pages that are
> -currently not used) before allocating off node pages.
> +Consider enabling one or more zone_reclaim mode bits if it's known that the
> +workload is partitioned such that each partition fits within a NUMA node
> +and that accessing remote memory would cause a measurable performance
> +reduction.  The page allocator will take additional actions before
> +allocating off node pages.
>  
>  Allowing zone reclaim to write out pages stops processes that are
>  writing large amounts of data from dirtying pages on other nodes. Zone
> diff -puN mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi mm/vmscan.c
> --- a/mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi	2021-01-25 16:23:06.052866718 -0800
> +++ b/mm/vmscan.c	2021-01-25 16:23:06.057866718 -0800
> @@ -4086,8 +4086,13 @@ module_init(kswapd_init)
>   */
>  int node_reclaim_mode __read_mostly;
>  
> -#define RECLAIM_WRITE (1<<0)	/* Writeout pages during reclaim */
> -#define RECLAIM_UNMAP (1<<1)	/* Unmap pages during reclaim */
> +/*
> + * These bit locations are exposed in the vm.zone_reclaim_mode sysctl
> + * ABI.  New bits are OK, but existing bits can never change.
> + */
> +#define RECLAIM_ZONE  (1<<0)   /* Run shrink_inactive_list on the zone */
> +#define RECLAIM_WRITE (1<<1)   /* Writeout pages during reclaim */
> +#define RECLAIM_UNMAP (1<<2)   /* Unmap pages during reclaim */
>  
>  /*
>   * Priority for NODE_RECLAIM. This determines the fraction of pages
> _
> 

-- 
Oscar Salvador
SUSE L3




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux