On 02/03/2014 03:04 PM, David Rientjes wrote: > On Mon, 3 Feb 2014, Vladimir Davydov wrote: > >> AFAIU, cgroup identifiers dumped on oom (cgroup paths, currently) and >> memcg slab cache names serve for different purposes. > Sure, you may dump the name for a number of legitimate reasons, but the > problem still exists that it's difficult to determine what memcg is being > referenced without a flat hierarchy and unique memcg names for all > children. > >> The point is oom is >> a perfectly normal situation for the kernel, and info dumped to dmesg is >> for admin to find out the cause of the problem (a greedy user or >> cgroup). > Hmm, so if we hand out top-level memcgs to individual jobs or users, like > our userspace does, and they are able to configure their child memcgs as > they wish, and then they or the admin finds in the kernel log that a > memory hog was killed from the memcg with the perfectly anonymous memcg > name of "memcg", how do we determine what job or user triggered that kill? > User id is not going to be conclusive in a production environment with > shared user accounts. > >> On the other hand, slab cache names are dumped to dmesg only on >> extraordinary situations - like bugs in slab implementation, or double >> free, or detected memory leaks - where we usually do not need the name >> of the memcg that triggered the problem, because the bug is likely to be >> in the kernel subsys using the cache. > There's certainly overlap here since slab leaks triggered by a particular > workload, perhaps by usage of a particular syscall, can occur and cause > oom killing but the problem remains that neither the memcg name nor the > slab cache name may be conclusive to determine what job or user triggered > the issue. That's why we make strict demands that memcg names are always > unique and encode several key values to identify the user and job and we > don't rely on the parent. > > I can also see the huge maintenance burden it would be to keep around a > mapping of kmem ids to {user, job} pairs just in case we later identify a > problem and in 99% of the cases would be just wasted storage. > >> Plus, the names are exported to >> sysfs in case of slub, again for debugging purposes, AFAIK. So IMO the >> use cases for oom vs slab names are completely different - information >> vs debugging - and I want to export kmem.id only for the ability of >> debugging kmemcg and slab subsystems. >> > Eeek, I'm not sure I agree. I've often found that reproducing rare slab > issues is very difficult without knowledge of the workload so that I can > reproduce it. Whereas X is a very large number of machines and we see > this issue on 0.0001% of X machines, I would be required to enable this > "debugging" aid unconditionally to ever be able to map the stored kmem id > back to a user and job, that mapping would be extremely costly to > maintain, and we've gained nothing if we had already demanded that > userspace identify their memcg names with unique identifiers regardless of > where they are in the hierarchy. I see your point, and it sounds quite reasonable to me. So I guess I'll drop the patch removing the cgroup name part from slab cache names (patch 2) and resend. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>