On Tue, Mar 07, 2017 at 11:56:31AM -0500, Johannes Weiner wrote: > On Tue, Mar 07, 2017 at 11:17:02AM +0100, Michal Hocko wrote: > > On Mon 06-03-17 11:24:10, Johannes Weiner wrote: > > > @@ -3271,7 +3271,8 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) > > > * Raise priority if scanning rate is too low or there was no > > > * progress in reclaiming pages > > > */ > > > - if (raise_priority || !sc.nr_reclaimed) > > > + nr_reclaimed = sc.nr_reclaimed - nr_reclaimed; > > > + if (raise_priority || !nr_reclaimed) > > > sc.priority--; > > > } while (sc.priority >= 1); > > > > > > > I would rather not play with the sc state here. From a quick look at > > least > > /* > > * Fragmentation may mean that the system cannot be rebalanced for > > * high-order allocations. If twice the allocation size has been > > * reclaimed then recheck watermarks only at order-0 to prevent > > * excessive reclaim. Assume that a process requested a high-order > > * can direct reclaim/compact. > > */ > > if (sc->order && sc->nr_reclaimed >= compact_gap(sc->order)) > > sc->order = 0; > > > > does rely on the value. Wouldn't something like the following be safer? > > Well, what behavior is correct, though? This check looks like an > argument *against* resetting sc.nr_reclaimed. > > If kswapd is woken up for a higher order, this check sets a reclaim > cutoff beyond which it should give up on the order and balance for 0. > > That's on the scope of the kswapd invocation. Applying this threshold > to the outcome of just the preceeding priority seems like a mistake. > > Mel? Vlastimil? I cannot say which is definitely the correct behaviour. The current behaviour is conservative due to the historical concerns about kswapd reclaiming the world. The hazard as I see it is that resetting it *may* lead to more aggressive reclaim for high-order allocations. That may be a welcome outcome to some that really want high-order pages and be unwelcome to others that prefer pages to remain resident. However, in this case it's a tight window and problems would be tricky to detect. THP allocations won't trigger the behaviour and with vmalloc'd stack, I'd expect that only SLUB-intensive workloads using high-order pages would trigger any adverse behaviour. While I'm mildly concerned, I would be a little surprised if it actually caused runaway reclaim. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>