Hi Dave On Thu, Jun 25, 2020 at 05:34:59PM -0700, Dave Hansen wrote: > > From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > > I went to go add a new RECLAIM_* mode for the zone_reclaim_mode > sysctl. Like a good kernel developer, I also went to go update the > documentation. I noticed that the bits in the documentation didn't > match the bits in the #defines. Drop the this paragraph from the commit message. It doesn't add any necessart information. Please have a look at https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes > The VM evidently stopped caring about RECLAIM_ZONE at some point (or > never cared) and the #define itself was later removed as a cleanup. > Those things by themselves are fine. > > But, the _other_ bit locations also got changed. That's not OK because > the bit values are documented to mean one specific thing and users > surely rely on them meaning that one thing and not changing from > kernel to kernel. The end result is that if someone had a script > that did: > > sysctl vm.zone_reclaim_mode=1 > > That script went from doing nothing to writing out pages during > node reclaim after the commit in question. That's not great. > > Put the bits back the way they were and add a comment so something > like this is a bit harder to do again. Update the documentation to > make it clear that the first bit is ignored. > > Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > Fixes: commit 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE") > Acked-by: Ben Widawsky <ben.widawsky@xxxxxxxxx> > Cc: Alex Shi <alex.shi@xxxxxxxxxxxxxxxxx> > Cc: Daniel Wagner <dwagner@xxxxxxx> > Cc: "Tobin C. Harding" <tobin@xxxxxxxxxx> > Cc: Christoph Lameter <cl@xxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: stable@xxxxxxxxxx > --- > > b/Documentation/admin-guide/sysctl/vm.rst | 12 ++++++------ > b/mm/vmscan.c | 9 +++++++-- > 2 files changed, 13 insertions(+), 8 deletions(-) > > diff -puN mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi mm/vmscan.c > --- a/mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi 2020-06-25 17:32:11.559165912 -0700 > +++ b/mm/vmscan.c 2020-06-25 17:32:11.572165912 -0700 > @@ -4090,8 +4090,13 @@ module_init(kswapd_init) > */ > int node_reclaim_mode __read_mostly; > > -#define RECLAIM_WRITE (1<<0) /* Writeout pages during reclaim */ > -#define RECLAIM_UNMAP (1<<1) /* Unmap pages during reclaim */ > +/* > + * These bit locations are exposed in the vm.zone_reclaim_mode sysctl > + * ABI. New bits are OK, but existing bits can never change. > + */ > +#define RECLAIM_RSVD (1<<0) /* (currently ignored/unused) */ > +#define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */ > +#define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */ > > /* > * Priority for NODE_RECLAIM. This determines the fraction of pages > diff -puN Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi Documentation/admin-guide/sysctl/vm.rst > --- a/Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi 2020-06-25 17:32:11.562165912 -0700 > +++ b/Documentation/admin-guide/sysctl/vm.rst 2020-06-25 17:32:11.572165912 -0700 > @@ -938,7 +938,7 @@ in the system. > This is value OR'ed together of > > = =================================== > -1 Zone reclaim on > +1 (bit currently ignored) > 2 Zone reclaim writes dirty pages out > 4 Zone reclaim swaps pages > = =================================== > @@ -948,11 +948,11 @@ that benefit from having their data cach > left disabled as the caching effect is likely to be more important than > data locality. > > -zone_reclaim may be enabled if it's known that the workload is partitioned > -such that each partition fits within a NUMA node and that accessing remote > -memory would cause a measurable performance reduction. The page allocator > -will then reclaim easily reusable pages (those page cache pages that are > -currently not used) before allocating off node pages. > +Consider enabling one or more zone_reclaim mode bits if it's known that the > +workload is partitioned such that each partition fits within a NUMA node > +and that accessing remote memory would cause a measurable performance > +reduction. The page allocator will take additional actions before > +allocating off node pages. I think the documentation update should not be part of this patch. This makes the back porting to stable more difficult. Thanks, Daniel