On Mon, 3 Feb 2014, Vladimir Davydov wrote: > AFAIU, cgroup identifiers dumped on oom (cgroup paths, currently) and > memcg slab cache names serve for different purposes. Sure, you may dump the name for a number of legitimate reasons, but the problem still exists that it's difficult to determine what memcg is being referenced without a flat hierarchy and unique memcg names for all children. > The point is oom is > a perfectly normal situation for the kernel, and info dumped to dmesg is > for admin to find out the cause of the problem (a greedy user or > cgroup). Hmm, so if we hand out top-level memcgs to individual jobs or users, like our userspace does, and they are able to configure their child memcgs as they wish, and then they or the admin finds in the kernel log that a memory hog was killed from the memcg with the perfectly anonymous memcg name of "memcg", how do we determine what job or user triggered that kill? User id is not going to be conclusive in a production environment with shared user accounts. > On the other hand, slab cache names are dumped to dmesg only on > extraordinary situations - like bugs in slab implementation, or double > free, or detected memory leaks - where we usually do not need the name > of the memcg that triggered the problem, because the bug is likely to be > in the kernel subsys using the cache. There's certainly overlap here since slab leaks triggered by a particular workload, perhaps by usage of a particular syscall, can occur and cause oom killing but the problem remains that neither the memcg name nor the slab cache name may be conclusive to determine what job or user triggered the issue. That's why we make strict demands that memcg names are always unique and encode several key values to identify the user and job and we don't rely on the parent. I can also see the huge maintenance burden it would be to keep around a mapping of kmem ids to {user, job} pairs just in case we later identify a problem and in 99% of the cases would be just wasted storage. > Plus, the names are exported to > sysfs in case of slub, again for debugging purposes, AFAIK. So IMO the > use cases for oom vs slab names are completely different - information > vs debugging - and I want to export kmem.id only for the ability of > debugging kmemcg and slab subsystems. > Eeek, I'm not sure I agree. I've often found that reproducing rare slab issues is very difficult without knowledge of the workload so that I can reproduce it. Whereas X is a very large number of machines and we see this issue on 0.0001% of X machines, I would be required to enable this "debugging" aid unconditionally to ever be able to map the stored kmem id back to a user and job, that mapping would be extremely costly to maintain, and we've gained nothing if we had already demanded that userspace identify their memcg names with unique identifiers regardless of where they are in the hierarchy. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>