On Thu, Aug 22, 2024 at 01:35:29AM GMT, Kairui Song wrote: > Shakeel Butt <shakeel.butt@xxxxxxxxx> 于 2024年8月21日周三 08:22写道: [...] > > Hi, Thanks for the comments. > > > Is this a real issue? Have you seen systems in the production with > > large amount of memory occupied by anon shadow entries? This is still > > limited to the amount of swap a cgroup is allowed to use. > > No, this patch is cherry picked from previous series, this help > separating the shadows to different cgroup properly according to my > test, and reduces the lock contention of list_lru by a lot combined > with later patches. Not very convincing on its own indeed, so I > hesitated to send it alone. > So, list_lru lock contention is the problem you are trying to solve. Without this patch, do you see less impact of your list_lru series? Anyways this patch is not the right way to solve the list_lru lock contention issue. > > The reason I am asking is that this solution is worse than the perceived > > problem at least to me. With this patch, the kernel will be charging > > unrelated cgroups for the memory of swap xarray nodes during global > > reclaim and proactive reclaim. > > Yes, this could be a problem. > > I didn't observe this happening frequently with tests though, SWAP > tends to cluster the SWAP allocations, and reclaiming tends to batch > reclaim pages, so usually there is a fair high chance that shadows of > pages of the same memcg stay on the same node. > > It could end up completely random when the SWAP device is getting > fragmented or reclaim is struggling though. In actual production, fragmentation and memory over-commit is very normal. So, such scenarios would occure more often. > > > You can reduce this weirdness by using set_active_memcg() in > > add_to_swap_cache() using the given folio's memcg but still you have the > > case of multiple unrelated folios and shadow entries of different > > cgroups within the same node. For filesystem case, the userspace can > > control which files are shared between different cgroups and has more > > control on it. That is not the case for swap space. > > Right, this fix is not perfect, it's arguable if this new behaviour is > better or worse than before. There is some ongoing work from the SWAP > side so things may get fixed differently in the future, but I'll also > check if this patch can be improved. Yeah with mTHP we can reevaluate this approach.