On Mon 21-07-14 15:48:39, Vladimir Davydov wrote: > On Mon, Jul 21, 2014 at 11:07:24AM +0200, Michal Hocko wrote: > > On Fri 18-07-14 19:44:43, Vladimir Davydov wrote: > > > On Wed, Jul 16, 2014 at 11:58:14AM -0400, Johannes Weiner wrote: > > > > On Wed, Jul 16, 2014 at 04:39:38PM +0200, Michal Hocko wrote: > > > > > +#ifdef CONFIG_MEMCG_KMEM > > > > > + { > > > > > + .name = "kmem.limit_in_bytes", > > > > > + .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT), > > > > > + .write = mem_cgroup_write, > > > > > + .read_u64 = mem_cgroup_read_u64, > > > > > + }, > > > > > > > > Does it really make sense to have a separate limit for kmem only? > > > > IIRC, the reason we introduced this was that this memory is not > > > > reclaimable and so we need to limit it. > > > > > > > > But the opposite effect happened: because it's not reclaimable, the > > > > separate kmem limit is actually unusable for any values smaller than > > > > the overall memory limit: because there is no reclaim mechanism for > > > > that limit, once you hit it, it's over, there is nothing you can do > > > > anymore. The problem isn't so much unreclaimable memory, the problem > > > > is unreclaimable limits. > > > > > > > > If the global case produces memory pressure through kernel memory > > > > allocations, we reclaim page cache, anonymous pages, inodes, dentries > > > > etc. I think the same should happen for kmem: kmem should just be > > > > accounted and limited in the overall memory limit of a group, and when > > > > pressure arises, we go after anything that's reclaimable. > > > > > > Personally, I don't think there's much sense in having a separate knob > > > for kmem limit either. Until we have a user with a sane use case for it, > > > let's not propagate it to the new interface. > > > > What about fork-bomb forks protection? I thought that was the primary usecase > > for K < U? Or how can we handle that use case with a single limit? A > > special gfp flag to not trigger OOM path when called from some kmem > > charge paths? > > Hmm, for a moment I thought that putting a fork-bomb inside a memory > cgroup with kmem accounting enabled and K=U will isolate it from the > rest of the system and therefore there's no need in K<U, but now I > realize it's not quite right. > > In contrast to user memory, thread stack allocations have costly order, > they cannot be swapped out, and on 32-bit systems they will consume a > limited resource of low mem. Although the latter two doesn't look like > being of much concern, costly order of stack pages certainly does I > think. > > Is this what you mean by saying we have to disable OOM from some kmem > charge paths? To prevent OOM on the global level that might trigger due > to lack of high order pages for task stack? No, I meant it for a different reason. If you simply cause OOM from e.g. stack charge then you simply DoS your cgroup before you start effectively stopping fork-bomb because the fork-bomb will usually have much smaller RSS than anything else in the group. So this is a case where you really want to fail the allocation. Maybe I just didn't understand what a single-limit proposal meant... > > What about task_count or what was the name of the controller which was > > dropped and suggested to be replaced by kmem accounting? I can imagine > > that to be implemented by a separate K limit which would be roughtly > > stack_size * task_count + pillow for slab. > > I wonder how big this pillow for slab should be... Well, it obviously depends on the load running in the group. It depends on the amount of unreclaimable slab + reclaimable_and_still_not_trashing amount of slab. So the pillow should be quite large but that shouldn't be a big deal as the kernel allocations usually are a small part of the U. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>