On Tue, Dec 7, 2021 at 3:44 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Tue, 7 Dec 2021 17:40:13 -0500 Nico Pache <npache@xxxxxxxxxx> wrote: > > > We have run into a panic caused by a shrinker allocation being attempted > > on an offlined node. > > > > Our crash analysis has determined that the issue originates from trying > > to allocate pages on an offlined node in expand_one_shrinker_info. This > > function makes the incorrect assumption that we can allocate on any node. > > To correct this we make sure the node is online before tempting an > > allocation. If it is not online choose the closest node. > > This isn't fully accurate, is it? We could allocate on a node which is > presently offline but which was previously onlined, by testing > NODE_DATA(nid). > > It isn't entirely clear to me from the v1 discussion why this approach > isn't being taken? > > AFAICT the proposed patch is *already* taking this approach, by having > no protection against a concurrent or subsequent node offlining? AFAICT, we have not reached agreement on how to fix it yet. I saw 3 proposals at least: 1. From Michal, allocate node data for all possible nodes. https://lore.kernel.org/all/Ya89aqij6nMwJrIZ@xxxxxxxxxxxxxx/T/#u 2. What this patch does. Proposed originally from https://lore.kernel.org/all/20211108202325.20304-1-amakhalov@xxxxxxxxxx/T/#u 3. From David, fix in node_zonelist(). https://lore.kernel.org/all/51c65635-1dae-6ba4-daf9-db9df0ec35d8@xxxxxxxxxx/T/#u > > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -222,13 +222,16 @@ static int expand_one_shrinker_info(struct mem_cgroup *memcg, > > int size = map_size + defer_size; > > > > for_each_node(nid) { > > + int tmp = nid; > > Not `tmp', please. Better to use an identifier which explains the > variable's use. target_nid? > > And a newline after defining locals, please. > > > pn = memcg->nodeinfo[nid]; > > old = shrinker_info_protected(memcg, nid); > > /* Not yet online memcg */ > > if (!old) > > return 0; > > > > - new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid); > > + if(!node_online(nid)) > > s/if(/if (/ > > > + tmp = numa_mem_id(); > > + new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, tmp); > > if (!new) > > return -ENOMEM; > > > > And a code comment fully explaining what's going on here?