On Thu 20-07-17 08:55:42, Vlastimil Babka wrote: > On 07/14/2017 10:00 AM, Michal Hocko wrote: > > From: Michal Hocko <mhocko@xxxxxxxx> > > > > build_zonelists gradually builds zonelists from the nearest to the most > > distant node. As we do not know how many populated zones we will have in > > each node we rely on the _zoneref to terminate initialized part of the > > zonelist by a NULL zone. While this is functionally correct it is quite > > suboptimal because we cannot allow updaters to race with zonelists > > users because they could see an empty zonelist and fail the allocation > > or hit the OOM killer in the worst case. > > > > We can do much better, though. We can store the node ordering into an > > already existing node_order array and then give this array to > > build_zonelists_in_node_order and do the whole initialization at once. > > zonelists consumers still might see halfway initialized state but that > > should be much more tolerateable because the list will not be empty and > > they would either see some zone twice or skip over some zone(s) in the > > worst case which shouldn't lead to immediate failures. > > > > This patch alone doesn't introduce any functional change yet, though, it > > is merely a preparatory work for later changes. > > > > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> > > I've collected the fold-ups from this thread and looked at the result as > single patch. Sems OK, just two things: > - please rename variable "i" in build_zonelists() to e.g. "nr_nodes" > - the !CONFIG_NUMA variant of build_zonelists() won't build, because it > doesn't declare nr_zones variable Thanks! I will fold this in. --- diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c0d3e8eeb150..6f192405e469 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4957,7 +4957,7 @@ static void build_thisnode_zonelists(pg_data_t *pgdat) static void build_zonelists(pg_data_t *pgdat) { static int node_order[MAX_NUMNODES]; - int node, load, i = 0; + int node, load, nr_nodes = 0; nodemask_t used_mask; int local_node, prev_node; @@ -4978,12 +4978,12 @@ static void build_zonelists(pg_data_t *pgdat) node_distance(local_node, prev_node)) node_load[node] = load; - node_order[i++] = node; + node_order[nr_nodes++] = node; prev_node = node; load--; } - build_zonelists_in_node_order(pgdat, node_order, i); + build_zonelists_in_node_order(pgdat, node_order, nr_nodes); build_thisnode_zonelists(pgdat); } @@ -5013,10 +5013,11 @@ static void build_zonelists(pg_data_t *pgdat) { int node, local_node; struct zoneref *zonerefs; + int nr_zones; local_node = pgdat->node_id; - zonrefs = pgdat->node_zonelists[ZONELIST_FALLBACK]._zonerefs; + zonerefs = pgdat->node_zonelists[ZONELIST_FALLBACK]._zonerefs; nr_zones = build_zonerefs_node(pgdat, zonerefs); zonerefs += nr_zones; -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>