On Thu 27-04-23 02:21:30, Yosry Ahmed wrote: > On Wed, Apr 26, 2023 at 8:27 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > On Wed 26-04-23 13:39:19, Yosry Ahmed wrote: > > > Commit c8713d0b2312 ("mm: memcontrol: dump memory.stat during cgroup > > > OOM") made sure we dump all the stats in memory.stat during a cgroup > > > OOM, but it also introduced a slight behavioral change. The code used to > > > print the non-hierarchical v1 cgroup stats for the entire cgroup > > > subtree, not it only prints the v2 cgroup stats for the cgroup under > > > OOM. > > > > > > Although v2 stats are a superset of v1 stats, some of them have > > > different naming. We also lost the non-hierarchical stats for the cgroup > > > under OOM in v1. > > > > Why is that a problem worth solving? It would be also nice to add an > > example of the oom report before and after the patch. > > -- > > Michal Hocko > > SUSE Labs > > Thanks for taking a look! > > The problem is that when upgrading to a kernel that contains > c8713d0b2312 on cgroup v1, the OOM logs suddenly change. The stats > names become different, a couple of stats are gone, and the > non-hierarchical stats disappear. > > The non-hierarchical stats are important to identify if a memcg OOM'd > because of the memory consumption of its own processes or its > descendants. In the example below, I created a parent memcg "a", and a > child memcg "b". A process in "a" itself ("tail" in this case) is > hogging memory and causing an OOM, not the processes in the child "b" > (the "sleep" processes). With non-hierarchical stats, it's clear that > this is the case. Is this difference really important from the OOM POV. There is no group oom semantic in v1 and so it always boils down to a specific process that gets selected. Which memcg it is sitting in shouldn't matter all that much. Or does it really matter? > Also, it is generally nice to keep things consistent as much as > possible. The sudden change of the OOM log with the kernel upgrade is > confusing, especially that the memcg stats in the OOM logs in cgroup > v1 now look different from the stats in memory.stat. Generally speaking oom report is not carved into stone. While we shouldn't make changes just nilly willy it might change for implementation specific reasons. In this particular case I would agree that the new output is more confusing than helpful. Just look at > [ 88.339505] pgscan 0 > [ 88.339505] pgsteal 0 > [ 88.339506] pgscan_kswapd 0 > [ 88.339506] pgscan_direct 0 > [ 88.339507] pgscan_khugepaged 0 > [ 88.339507] pgsteal_kswapd 0 > [ 88.339508] pgsteal_direct 0 > [ 88.339508] pgsteal_khugepaged 0 These stats are actively misleading because it would suggest there was no memory reclaim done before oom was hit and that would imply a potentially premature OOM killer invocation (thus a bug). There are likely other stats which are not tracked in v1 yet they are reported that might add to the confusion. I believe this would be a sound justification to get back to the original reporting. -- Michal Hocko SUSE Labs