On Fri, Jun 17, 2016 at 12:06:55PM +0300, Vladimir Davydov wrote: > On Wed, Jun 15, 2016 at 11:42:44PM -0400, Johannes Weiner wrote: > > The memory controller has quite a bit of state that usually outlives > > the cgroup and pins its CSS until said state disappears. At the same > > time it imposes a 16-bit limit on the CSS ID space to economically > > store IDs in the wild. Consequently, when we use cgroups to contain > > frequent but small and short-lived jobs that leave behind some page > > cache, we quickly run into the 64k limitations of outstanding CSSs. > > Creating a new cgroup fails with -ENOSPC while there are only a few, > > or even no user-visible cgroups in existence. > > > > Although pinning CSSs past cgroup removal is common, there are only > > two instances that actually need a CSS ID after a cgroup is deleted: > > cache shadow entries and swapout records. > > > > Cache shadow entries reference the ID weakly and can deal with the CSS > > having disappeared when it's looked up later. They pose no hurdle. > > > > Swap-out records do need to pin the css to hierarchically attribute > > swapins after the cgroup has been deleted; though the only pages that > > remain swapped out after a process exits are tmpfs/shmem pages. Those > > references are under the user's control and thus manageable. > > > > This patch introduces a private 16bit memcg ID and switches swap and > > cache shadow entries over to using that. It then decouples the CSS > > lifetime from the CSS ID lifetime, such that a CSS ID can be recycled > > when the CSS is only pinned by common objects that don't need an ID. > > There's already id which is only used for online memory cgroups - it's > kmemcg_id. May be, instead of introducing one more idr, we could name it > generically and reuse it for shadow entries? Good point. But it seems mem_cgroup_idr is more generic, it makes sense to switch slab accounting over to that. I'll look into that, but as a refactoring patch on top of this fix. > Regarding swap entries, would it really make much difference if we used > 4 bytes per swap page instead of 2? For a 100 GB swap it'd increase > overhead from 50 MB up to 100 MB, which still doesn't seem too much IMO, > so may be just use plain unrestricted css->id for swap entries? Yes and no. I agree that the increased consumption wouldn't be too crazy, but if we have to maintain a 16-bit ID anyway, we might as well use it for swap too to save that space. I don't think tmpfs and shmem pins past offlining will be common enough to significantly eat into the ID space of online cgroups. -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html