On Mon, Dec 6, 2021 at 11:01 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 06.12.21 19:42, Yang Shi wrote: > > On Mon, Dec 6, 2021 at 5:19 AM Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote: > >> > >> On 06.12.2021 13:45, David Hildenbrand wrote: > >>>> This doesn't seen complete. Slab shrinkers are used in the reclaim > >>>> context. Previously offline nodes could be onlined later and this would > >>>> lead to NULL ptr because there is no hook to allocate new shrinker > >>>> infos. This would be also really impractical because this would have to > >>>> update all existing memcgs... > >>> > >>> Instead of going through the trouble of updating... > >>> > >>> ... maybe just keep for_each_node() and check if the target node is > >>> offline. If it's offline, just allocate from the first online node. > >>> After all, we're not using __GFP_THISNODE, so there are no guarantees > >>> either way ... > >> > >> Hm, can't we add shrinker maps allocation to __try_online_node() in addition > >> to this patch? > > > > I think the below fix (an example, doesn't cover all affected > > callsites) should be good enough for now? It doesn't touch the hot > > path of the page allocator. > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index fb9584641ac7..1252a33f7c28 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -222,13 +222,15 @@ static int expand_one_shrinker_info(struct > > mem_cgroup *memcg, > > int size = map_size + defer_size; > > > > for_each_node(nid) { > > + int tmp = nid; > > pn = memcg->nodeinfo[nid]; > > old = shrinker_info_protected(memcg, nid); > > /* Not yet online memcg */ > > if (!old) > > return 0; > > - > > - new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid); > > + if (!node_online(nid)) > > + tmp = -1; > > + new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, tmp); > > if (!new) > > return -ENOMEM; > > > > It used to use kvmalloc instead of kvmalloc_node(). The commit > > 86daf94efb11d7319fbef5e480018c4807add6ef ("mm/memcontrol.c: allocate > > shrinker_map on appropriate NUMA node") changed to use *_node() > > version. The justification was that "kswapd is always bound to > > specific node. So allocate shrinker_map from the related NUMA node to > > respect its NUMA locality." There is no kswapd for offlined node, so > > just allocate shrinker info on node 0. This is also what > > alloc_mem_cgroup_per_node_info() does. > > Yes, that's what I refer to as fixing it in the caller -- similar to > [1]. Michals point is to not require such node_online() checks at all, > neither in the caller nor in the buddy. > > I see 2 options short-term > > 1) What we have in [1]. > 2) What I proposed in [2], fixing it for all such instances until we > have something better. > > Long term I tend to agree that what Michal proposes is better. > > Short term I tend to like [2], because it avoids having to mess with all > such instances to eventually get it right and the temporary overhead > until we have the code reworked should be really negligible ... Thanks, David. Basically either option looks fine to me. But I'm a little bit concerned about [2]. It silently changes the node requested by the callers. It actually papers over potential bugs? And what if the callers specify __GFP_THISNODE (I didn't search if such callers really exist in the current code)? How's about a helper function, for example, called kvmalloc_best_node()? It does: void * kvmalloc_best_node(unsigned long size, int flag, int nid) { bool onlined = node_online(nid); WARN_ON_ONCE((flag & __GFP_THISNODE) && !onlined); if (!onlined) nid = -1; return kvmalloc_node(size, GFP_xxx, nid); } > > > > [1] https://lkml.kernel.org/r/20211108202325.20304-1-amakhalov@xxxxxxxxxx > [2] > https://lkml.kernel.org/r/51c65635-1dae-6ba4-daf9-db9df0ec35d8@xxxxxxxxxx > > -- > Thanks, > > David / dhildenb >