On Tue 13-12-22 17:14:48, Johannes Weiner wrote: > On Tue, Dec 13, 2022 at 04:41:10PM +0100, Michal Hocko wrote: > > Hi, > > I have just noticed that that pages allocated for demotion targets > > includes __GFP_KSWAPD_RECLAIM (through GFP_NOWAIT). This is the case > > since the code has been introduced by 26aa2d199d6f ("mm/migrate: demote > > pages during reclaim"). I suspect the intention is to trigger the aging > > on the fallback node and either drop or further demote oldest pages. > > > > This makes sense but I suspect that this wasn't intended also for > > memcg triggered reclaim. This would mean that a memory pressure in one > > hierarchy could trigger paging out pages of a different hierarchy if the > > demotion target is close to full. > > This is also true if you don't do demotion. If a cgroup tries to > allocate memory on a full node (i.e. mbind()), it may wake kswapd or > enter global reclaim directly which may push out the memory of other > cgroups, regardless of the respective cgroup limits. You are right on this. But this is describing a slightly different situaton IMO. > The demotion allocations don't strike me as any different. They're > just allocations on behalf of a cgroup. I would expect them to wake > kswapd and reclaim physical memory as needed. I am not sure this is an expected behavior. Consider the currently discussed memory.demote interface when the userspace can trigger (almost) arbitrary demotions. This can deplete fallback nodes without over-committing the memory overall yet push out demoted memory from other workloads. From the user POV it would look like a reclaim while the overall memory is far from depleted so it would be considered as premature and a warrant a bug report. The reclaim behavior would make more sense to me if it was constrained to the allocating memcg hierarchy so unrelated lruvecs wouldn't be disrupted. -- Michal Hocko SUSE Labs