On Fri, Jul 08, 2016 at 11:44:47AM +0900, Joonsoo Kim wrote: > > > > @@ -3390,12 +3386,24 @@ static int kswapd(void *p) > > > > * We can speed up thawing tasks if we don't call balance_pgdat > > > > * after returning from the refrigerator > > > > */ > > > > - if (!ret) { > > > > - trace_mm_vmscan_kswapd_wake(pgdat->node_id, order); > > > > + if (ret) > > > > + continue; > > > > > > > > - /* return value ignored until next patch */ > > > > - balance_pgdat(pgdat, order, classzone_idx); > > > > - } > > > > + /* > > > > + * Reclaim begins at the requested order but if a high-order > > > > + * reclaim fails then kswapd falls back to reclaiming for > > > > + * order-0. If that happens, kswapd will consider sleeping > > > > + * for the order it finished reclaiming at (reclaim_order) > > > > + * but kcompactd is woken to compact for the original > > > > + * request (alloc_order). > > > > + */ > > > > + trace_mm_vmscan_kswapd_wake(pgdat->node_id, alloc_order); > > > > + reclaim_order = balance_pgdat(pgdat, alloc_order, classzone_idx); > > > > + if (reclaim_order < alloc_order) > > > > + goto kswapd_try_sleep; > > > > > > This 'goto' would cause kswapd to sleep prematurely. We need to check > > > *new* pgdat->kswapd_order and classzone_idx even in this case. > > > > > > > It only matters if the next request coming is also high-order requests but > > one thing that needs to be avoided is kswapd staying awake periods of time > > constantly reclaiming for high-order pages. This is why the check means > > "If we reclaimed for high-order and failed, then consider sleeping now". > > If allocations still require it, they direct reclaim instead. > > But, assume that next request is zone-constrained allocation. We need > to balance memory for it but kswapd would skip it. > Then it'll also be woken up again in the very near future as the zone-constrained allocation. If the zone is at the min watermark, then it'll have direct reclaimed but between min and low, it'll be a simple wakeup. The premature sleep, wakeup with new requests logic was a complete mess. However, what I did do is remove the -1 handling of kswapd_classzone_idx handling and the goto full-sleep. In the event of a premature wakeup, it'll recheck for wakeups and if none has occured, it'll use the old classzone information. Note that it will *not* use the original allocation order if it's a premature sleep. This is because it's known that high-order reclaim failed in the near past and restarting it has a high risk of overreclaiming. > > > And, I'd like to know why max() is used for classzone_idx rather than > > > min()? I think that kswapd should balance the lowest zone requested. > > > > > > > If there are two allocation requests -- one zone-constraned and the other > > zone-unconstrained, it does not make sense to have kswapd skip the pages > > usable for the zone-unconstrained and waste a load of CPU. You could > > I agree that, in this case, it's not good to skip the pages usable > for the zone-unconstrained request. But, what I am concerned is that > kswapd stop reclaim prematurely in the view of zone-constrained > requestor. It doesn't stop reclaiming for the lower zones. It's reclaiming the LRU for the whole node that may or may not have lower zone pages at the end of the LRU. If it does, then the allocation request will be satisfied. If it does not, then kswapd will think the node is balanced and get rewoken to do a zone-constrained reclaim pass. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>