Re: PSI vs. CPU overhead for client computing

Suren Baghdasaryan <surenb@xxxxxxxxxx> · Tue, 23 Apr 2019 15:04:16 -0700

Hi Luigi,

On Tue, Apr 23, 2019 at 11:58 AM Luigi Semenzato <semenzato@xxxxxxxxxx> wrote:
>
> I and others are working on improving system behavior under memory
> pressure on Chrome OS.  We use zram, which swaps to a
> statically-configured compressed RAM disk.  One challenge that we have
> is that the footprint of our workloads is highly variable.  With zram,
> we have to set the size of the swap partition at boot time.  When the
> (logical) swap partition is full, we're left with some amount of RAM
> usable by file and anonymous pages (we can ignore the rest).  We don't
> get to control this amount dynamically.  Thus if the workload fits
> nicely in it, everything works well.  If it doesn't, then the rate of
> anonymous page faults can be quite high, causing large CPU overhead
> for compression/decompression (as well as for other parts of the MM).
>
> In Chrome OS and Android, we have the luxury that we can reduce
> pressure by terminating processes (tab discard in Chrome OS, app kill
> in Android---which incidentally also runs in parallel with Chrome OS
> on some chromebooks).  To help decide when to reduce pressure, we
> would like to have a reliable and device-independent measure of MM CPU
> overhead.  I have looked into PSI and have a few questions.  I am also
> looking for alternative suggestions.
>
> PSI measures the times spent when some and all tasks are blocked by
> memory allocation.  In some experiments, this doesn't seem to
> correlate too well with CPU overhead (which instead correlates fairly
> well with page fault rates).  Could this be because it includes
> pressure from file page faults?

This might be caused by thrashing (see:
https://elixir.bootlin.com/linux/v5.1-rc6/source/mm/filemap.c#L1114).

>  Is there some way of interpreting PSI
> numbers so that the pressure from file pages is ignored?

I don't think so but I might be wrong. Notice here
https://elixir.bootlin.com/linux/v5.1-rc6/source/mm/filemap.c#L1111
you could probably use delayacct to distinguish file thrashing,
however remember that PSI takes into account the number of CPUs and
the number of currently non-idle tasks in its pressure calculations,
so the raw delay numbers might not be very useful here.

> What is the purpose of "some" and "full" in the PSI measurements?  The
> chrome browser is a multi-process app and there is a lot of IPC.  When
> process A is blocked on memory allocation, it cannot respond to IPC
> from process B, thus effectively both processes are blocked on
> allocation, but we don't see that.

I don't think PSI would account such an indirect stall when A is
waiting for B and B is blocked on memory access. B's stall will be
accounted for but I don't think A's blocked time will go into PSI
calculations. The process inter-dependencies are probably out of scope
for PSI.

> Also, there are situations in
> which some "uninteresting" process keep running.  So it's not clear we
> can rely on "full".  Or maybe I am misunderstanding?  "Some" may be a
> better measure, but again it doesn't measure indirect blockage.

Johannes explains the SOME and FULL calculations here:
https://elixir.bootlin.com/linux/v5.1-rc6/source/kernel/sched/psi.c#L76
and includes couple examples with the last one showing FULL>0 and some
tasks still running.

> The kernel contains various cpustat measurements, including some
> slightly esoteric ones such as CPUTIME_GUEST and CPUTIME_GUEST_NICE.
> Would adding a CPUTIME_MEM be out of the question?
>
> Thanks!
>

Just my 2 cents and Johannes being the author might have more to say here.