On 12/7/21 18:44, Andrew Morton wrote: > On Tue, 7 Dec 2021 17:40:13 -0500 Nico Pache <npache@xxxxxxxxxx> wrote: > >> We have run into a panic caused by a shrinker allocation being attempted >> on an offlined node. >> >> Our crash analysis has determined that the issue originates from trying >> to allocate pages on an offlined node in expand_one_shrinker_info. This >> function makes the incorrect assumption that we can allocate on any node. >> To correct this we make sure the node is online before tempting an >> allocation. If it is not online choose the closest node. > > This isn't fully accurate, is it? We could allocate on a node which is > presently offline but which was previously onlined, by testing > NODE_DATA(nid). Thanks for the review! I took your changes below into consideration for my V3. My knowledge of offlined/onlined nodes is quite limited but after looking into it it doesnt seem like anything clears the state of NODE_DATA(nid) after a try_offline_node is attempted. So theoretically the panic we saw would not happen. What is the expected behavior of trying to allocate a page on a offline node? > > It isn't entirely clear to me from the v1 discussion why this approach > isn't being taken? > > AFAICT the proposed patch is *already* taking this approach, by having > no protection against a concurrent or subsequent node offlining? > >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -222,13 +222,16 @@ static int expand_one_shrinker_info(struct mem_cgroup *memcg, >> int size = map_size + defer_size; >> >> for_each_node(nid) { >> + int tmp = nid; > > Not `tmp', please. Better to use an identifier which explains the > variable's use. target_nid? > > And a newline after defining locals, please. > >> pn = memcg->nodeinfo[nid]; >> old = shrinker_info_protected(memcg, nid); >> /* Not yet online memcg */ >> if (!old) >> return 0; >> >> - new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid); >> + if(!node_online(nid)) > > s/if(/if (/ > >> + tmp = numa_mem_id(); >> + new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, tmp); >> if (!new) >> return -ENOMEM; >> > > And a code comment fully explaining what's going on here? >