On Wed 18-12-13 14:20:15, Johannes Weiner wrote: > On Wed, Dec 18, 2013 at 05:20:50PM +0100, Michal Hocko wrote: [...] > > Currently we have a per-process (cpuset in fact) flag but this will > > change it to all or nothing. Is this really a good step? > > Btw. I do not mind having PF_SPREAD_PAGE enabled by default. > > I don't want to muck around with cpusets too much, tbh... but I agree > that the behavior of PF_SPREAD_PAGE should be the default. Except it > should honor zone_reclaim_mode and round-robin nodes that are within > RECLAIM_DISTANCE of the local one. Agreed. > I will have spotty access to internet starting tomorrow night until > New Year's. Is there a chance we can maybe revert the NUMA aspects of > the original patch for now and leave it as a node-local zone fairness > thing? Yes, that sounds perfectly reasonable to me. > The NUMA behavior was so broken on 3.12 that I doubt that > people have come to rely on the cache fairness on such machines in > that one release. So we should be able to release 3.12-stable and > 3.13 with node-local zone fairness without regressing anybody, and > then give the NUMA aspect of it another try in 3.14. > > Something like the following should restore NUMA behavior while still > fixing the kswapd vs. page allocator interaction bug of thrashing on > the highest zone. Yes, it looks good to me. I guess zone_local could have stayed as it was because it shouldn't be a big deal to fall-back to a different node if the distance is LOCAL, but taking a conservative approach is not harmfull. > PS: zone_local() is in a CONFIG_NUMA block, which > is why accessing zone->node is safe :-) > > --- > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index dd886fac451a..317ea747d2cd 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1822,7 +1822,7 @@ static void zlc_clear_zones_full(struct zonelist *zonelist) > > static bool zone_local(struct zone *local_zone, struct zone *zone) > { > - return node_distance(local_zone->node, zone->node) == LOCAL_DISTANCE; > + return local_zone->node == zone->node; > } > > static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone) > @@ -1919,18 +1919,17 @@ get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order, > * page was allocated in should have no effect on the > * time the page has in memory before being reclaimed. > * > - * When zone_reclaim_mode is enabled, try to stay in > - * local zones in the fastpath. If that fails, the > - * slowpath is entered, which will do another pass > - * starting with the local zones, but ultimately fall > - * back to remote zones that do not partake in the > - * fairness round-robin cycle of this zonelist. > + * Try to stay in local zones in the fastpath. If > + * that fails, the slowpath is entered, which will do > + * another pass starting with the local zones, but > + * ultimately fall back to remote zones that do not > + * partake in the fairness round-robin cycle of this > + * zonelist. > */ > if (alloc_flags & ALLOC_WMARK_LOW) { > if (zone_page_state(zone, NR_ALLOC_BATCH) <= 0) > continue; > - if (zone_reclaim_mode && > - !zone_local(preferred_zone, zone)) > + if (!zone_local(preferred_zone, zone)) > continue; > } > /* > @@ -2396,7 +2395,7 @@ static void prepare_slowpath(gfp_t gfp_mask, unsigned int order, > * thrash fairness information for zones that are not > * actually part of this zonelist's round-robin cycle. > */ > - if (zone_reclaim_mode && !zone_local(preferred_zone, zone)) > + if (!zone_local(preferred_zone, zone)) > continue; > mod_zone_page_state(zone, NR_ALLOC_BATCH, > high_wmark_pages(zone) - > > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>