On Fri, Feb 11, 2022 at 4:13 AM Chris Down <chris@xxxxxxxxxxxxxx> wrote: > [...] > >To make high limit enforcement more robust, this patch makes the limit > >enforcement synchronous only if the accumulated overcharge becomes > >larger than MEMCG_CHARGE_BATCH. So, most of the allocations would still > >be throttled on the return-to-userspace path but only the extreme > >allocations which accumulates large amount of overcharge without > >returning to the userspace will be throttled synchronously. The value > >MEMCG_CHARGE_BATCH is a bit arbitrary but most of other places in the > >memcg codebase uses this constant therefore for now uses the same one. > > Note that mem_cgroup_handle_over_high() has its own allocator throttling grace > period, where it bails out if the penalty to apply is less than 10ms. The > reclaim will still happen, though. So throttling might not happen even for > roughly MEMCG_CHARGE_BATCH-sized allocations, depending on the overall size of > the cgroup and its protection. > Here by throttling, I meant both reclaim and schedule_timeout_killable(). I don't want to say low level details which might change in future. [...] > > Thanks, I was going to comment on v1 that I prefer to keep the implementation > of mem_cgroup_handle_over_high if possible since we know that the mechanism has > been safe in production over the past few years. > > One question I have is about throttling. It looks like this new > mem_cgroup_handle_over_high callsite may mean that throttling is invoked more > than once on a misbehaving workload that's failing to reclaim since the > throttling could be invoked both here and in return to userspace, right? That > might not be a problem, but we should think about the implications of that, > especially in relation to MEMCG_MAX_HIGH_DELAY_JIFFIES. > Please note that mem_cgroup_handle_over_high() clears memcg_nr_pages_over_high and if on the return-to-userspace path mem_cgroup_handle_over_high() finds that memcg_nr_pages_over_high is non-zero, then it means the task has further accumulated the charges over high limit after a possibly synchronous memcg_nr_pages_over_high() call. > Maybe we should record if throttling happened previously and avoid doing it > again for this entry into kernelspace? Not certain that's the right answer, but > we should think about what the new semantics should be. For now, I will keep this as is and will add a comment in the code and a mention in the commit message about it. I will wait for others to comment before sending the next version and thanks for taking a look.