Re: Memory CG and 5.1 to 5.6 uprade slows backup

Bruno Prémont <bonbons@xxxxxxxxxxxxxxxxx> · Thu, 9 Apr 2020 17:40:42 +0200

On Thu, 9 Apr 2020 16:24:17 +0100 wrote:

> Bruno Prémont writes:
> >Could it be that cache is being prevented from being reclaimed by a task
> >in another cgroup?
> >
> >e.g.
> >  cgroup/system/backup
> >    first reads $files (reads each once)
> >  cgroup/workload/bla
> >    second&more reads $files
> >
> >Would $files remain associated to cgroup/system/backup and not
> >reclaimed there instead of being reassigned to cgroup/workload/bla?  
> 
> Yes, that's entirely possible. The first cgroup to fault in the pages is 
> charged for the memory. Other cgroups may use them, but they are not accounted 
> for as part of that other cgroup. They may also still be "active" as a result 
> of use by another cgroup.

But the memory would then be 'active' in the original cgroup? which is
not the case here I feel.
If the remain inactive-unreclaimable in the first cgroup due to use in
another cgroup that would be at least surprising.

Doubling the high value helped (but for how long?), back with
memory.current around memory.high nut no throttling yet. But from
increase until now memory.pressure is small/zero.

Capturing 
  memory.stat:pgscan 47519855
  memory.stat:pgsteal 44933838
over time for Michal and will report back later this evening.

When seen stuck backup was reading a multi-GiB file with
  open(, O_NOATIME)
  while (read()) {
    transform and write to network
  }
  close()
thus plain sequential file read through file cache (and for this
backup run, only files not in use by anyone else, or some being
just appended to by others).

Bruno