On 6/21/19 1:43 PM, Alan Jenkins wrote: > When setting the low and high watermarks we use min_wmark_pages(zone). > I guess this is to reduce the line length. But we forgot that this macro > includes zone->watermark_boost. We need to reset zone->watermark_boost > first. Otherwise the watermarks will be set inconsistently. > > E.g. this could cause inconsistent values if the watermarks have been > boosted, and then you change a sysctl which triggers > __setup_per_zone_wmarks(). > > I strongly suspect this explains why I have seen slightly high watermarks. > Suspicious-looking zoneinfo below - notice high-low != low-min. > > Node 0, zone Normal > pages free 74597 > min 9582 > low 34505 > high 36900 > > https://unix.stackexchange.com/questions/525674/my-low-and-high-watermarks-seem-higher-than-predicted-by-documentation-sysctl-vm/525687 > > Signed-off-by: Alan Jenkins <alan.christopher.jenkins@xxxxxxxxx> > Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external > fragmentation event occurs") > Cc: stable@xxxxxxxxxxxxxxx Nice catch, thanks! Acked-by: Vlastimil Babka <vbabka@xxxxxxx> Personally I would implement it a bit differently, see below. If you agree, it's fine if you keep the authorship of the whole patch. > --- > > Tested by compiler :-). > > Ideally the commit message would be clear about what happens the > *first* time __setup_per_zone_watermarks() is called. I guess that > zone->watermark_boost is *usually* zero, or we would have noticed > some wild problems :-). However I am not familiar with how the zone > structures are allocated & initialized. Maybe there is a case where > zone->watermark_boost could contain an arbitrary unitialized value > at this point. Can we rule that out? Dunno if there's some arch override, but generic_alloc_nodedata() uses kzalloc() so it's zeroed. -----8<----- diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d66bc8abe0af..3b2f0cedf78e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7624,6 +7624,7 @@ static void __setup_per_zone_wmarks(void) for_each_zone(zone) { u64 tmp; + unsigned long wmark_min; spin_lock_irqsave(&zone->lock, flags); tmp = (u64)pages_min * zone_managed_pages(zone); @@ -7642,13 +7643,13 @@ static void __setup_per_zone_wmarks(void) min_pages = zone_managed_pages(zone) / 1024; min_pages = clamp(min_pages, SWAP_CLUSTER_MAX, 128UL); - zone->_watermark[WMARK_MIN] = min_pages; + wmark_min = min_pages; } else { /* * If it's a lowmem zone, reserve a number of pages * proportionate to the zone's size. */ - zone->_watermark[WMARK_MIN] = tmp; + wmark_min = tmp; } /* @@ -7660,8 +7661,9 @@ static void __setup_per_zone_wmarks(void) mult_frac(zone_managed_pages(zone), watermark_scale_factor, 10000)); - zone->_watermark[WMARK_LOW] = min_wmark_pages(zone) + tmp; - zone->_watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2; + zone->_watermark[WMARK_MIN] = wmark_min; + zone->_watermark[WMARK_LOW] = wmark_min + tmp; + zone->_watermark[WMARK_HIGH] = wmark_min + tmp * 2; zone->watermark_boost = 0; spin_unlock_irqrestore(&zone->lock, flags);