On Wed, Feb 19, 2020 at 08:53:32PM +0100, Michal Hocko wrote: > On Wed 19-02-20 14:16:18, Johannes Weiner wrote: > > On Wed, Feb 19, 2020 at 07:37:31PM +0100, Michal Hocko wrote: > > > On Wed 19-02-20 13:12:19, Johannes Weiner wrote: > > > > This patch adds asynchronous reclaim to the memory.high cgroup limit > > > > while keeping direct reclaim as a fallback. In our testing, this > > > > eliminated all direct reclaim from the affected workload. > > > > > > Who is accounted for all the work? Unless I am missing something this > > > just gets hidden in the system activity and that might hurt the > > > isolation. I do see how moving the work to a different context is > > > desirable but this work has to be accounted properly when it is going to > > > become a normal mode of operation (rather than a rare exception like the > > > existing irq context handling). > > > > Yes, the plan is to account it to the cgroup on whose behalf we're > > doing the work. How are you planning to do that? I've been thinking about how to account a kernel thread's CPU usage to a cgroup on and off while working on the parallelizing Michal mentions below. A few approaches are described here: https://lore.kernel.org/linux-mm/20200212224731.kmss6o6agekkg3mw@xxxxxxxxxxxxxxxxxxxxxxxxxx/ > shows that the amount of the work required for the high limit reclaim > can be non-trivial. Somebody has to do that work and we cannot simply > allow everybody else to pay for that. > > > The problem is that we have a general lack of usable CPU control right > > now - see Rik's work on this: https://lkml.org/lkml/2019/8/21/1208. > > For workloads that are contended on CPU, we cannot enable the CPU > > controller because the scheduling latencies are too high. And for > > workloads that aren't CPU contended, well, it doesn't really matter > > where the reclaim cycles are accounted to. > > > > Once we have the CPU controller up to speed, we can add annotations > > like these to account stretches of execution to specific > > cgroups. There just isn't much point to do it before we can actually > > enable CPU control on the real workloads where it would matter. Which annotations do you mean? I didn't see them when skimming through Rik's work or in this patch.