Hi Greg, On 03/27/2014 08:31 AM, Greg Thelen wrote: > On Wed, Mar 26 2014, Vladimir Davydov <vdavydov@xxxxxxxxxxxxx> wrote: > >> We don't track any random page allocation, so we shouldn't track kmalloc >> that falls back to the page allocator. > This seems like a change which will leads to confusing (and arguably > improper) kernel behavior. I prefer the behavior prior to this patch. > > Before this change both of the following allocations are charged to > memcg (assuming kmem accounting is enabled): > a = kmalloc(KMALLOC_MAX_CACHE_SIZE, GFP_KERNEL) > b = kmalloc(KMALLOC_MAX_CACHE_SIZE + 1, GFP_KERNEL) > > After this change only 'a' is charged; 'b' goes directly to page > allocator which no longer does accounting. Why do we need to charge 'b' in the first place? Can the userspace trigger such allocations massively? If there can only be one or two such allocations from a cgroup, is there any point in charging them? In fact, do we actually need to charge every random kmem allocation? I guess not. For instance, filesystems often allocate data shared among all the FS users. It's wrong to charge such allocations to a particular memcg, IMO. That said the next step is going to be adding a per kmem cache flag specifying if allocations from this cache should be charged so that accounting will work only for those caches that are marked so explicitly. There is one more argument for removing kmalloc_large accounting - we don't have an easy way to track such allocations, which prevents us from reparenting kmemcg charges on css offline. Of course, we could link kmalloc_large pages in some sort of per-memcg list which would allow us to find them on css offline, but I don't think such a complication is justified. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>