On Mon, Jan 15, 2018 at 4:29 AM, Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx> wrote: > > > On 01/13/2018 01:57 AM, Shakeel Butt wrote: >> On Fri, Jan 12, 2018 at 4:24 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote: >>> On Fri 12-01-18 00:59:38, Andrey Ryabinin wrote: >>>> On 01/11/2018 07:29 PM, Michal Hocko wrote: >>> [...] >>>>> I do not think so. Consider that this reclaim races with other >>>>> reclaimers. Now you are reclaiming a large chunk so you might end up >>>>> reclaiming more than necessary. SWAP_CLUSTER_MAX would reduce the over >>>>> reclaim to be negligible. >>>>> >>>> >>>> I did consider this. And I think, I already explained that sort of race in previous email. >>>> Whether "Task B" is really a task in cgroup or it's actually a bunch of reclaimers, >>>> doesn't matter. That doesn't change anything. >>> >>> I would _really_ prefer two patches here. The first one removing the >>> hard coded reclaim count. That thing is just dubious at best. If you >>> _really_ think that the higher reclaim target is meaningfull then make >>> it a separate patch. I am not conviced but I will not nack it it either. >>> But it will make our life much easier if my over reclaim concern is >>> right and we will need to revert it. Conceptually those two changes are >>> independent anywa. >>> >> >> Personally I feel that the cgroup-v2 semantics are much cleaner for >> setting limit. There is no race with the allocators in the memcg, >> though oom-killer can be triggered. For cgroup-v1, the user does not >> expect OOM killer and EBUSY is expected on unsuccessful reclaim. How >> about we do something similar here and make sure oom killer can not be >> triggered for the given memcg? >> >> // pseudo code >> disable_oom(memcg) >> old = xchg(&memcg->memory.limit, requested_limit) >> >> reclaim memory until usage gets below new limit or retries are exhausted >> >> if (unsuccessful) { >> reset_limit(memcg, old) >> ret = EBUSY >> } else >> ret = 0; >> enable_oom(memcg) >> >> This way there is no race with the allocators and oom killer will not >> be triggered. The processes in the memcg can suffer but that should be >> within the expectation of the user. One disclaimer though, disabling >> oom for memcg needs more thought. > > That's might be worse. If limit is too low, all allocations (except __GFP_NOFAIL of course) will start > failing. And the kernel not always careful enough in -ENOMEM handling. > Also, it's not much different from oom killing everything, the end result is almost the same - > nothing will work in that cgroup. > By disabling memcg oom, I meant to treat all allocations from that memcg as __GFP_NOFAIL until the oom is disabled. I will see if I can convert this into an actual code. > >> Shakeel >> -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html