Re: PSI vs. CPU overhead for client computing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you, I can try to do that.

It's not trivial to get right though.  I have to find the right
compromise.  A horribly wrong patch won't be taken seriously, but a
completely correct one would be a bit too much work, given the
probability that it will get rejected.

Thanks also to Johannes for the clarification!

On Wed, Apr 24, 2019 at 7:49 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> On Tue, Apr 23, 2019 at 9:54 PM Luigi Semenzato <semenzato@xxxxxxxxxx> wrote:
> >
> > Thank you very much Suren.
> >
> > On Tue, Apr 23, 2019 at 3:04 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> > >
> > > Hi Luigi,
> > >
> > > On Tue, Apr 23, 2019 at 11:58 AM Luigi Semenzato <semenzato@xxxxxxxxxx> wrote:
> > > >
> > > > I and others are working on improving system behavior under memory
> > > > pressure on Chrome OS.  We use zram, which swaps to a
> > > > statically-configured compressed RAM disk.  One challenge that we have
> > > > is that the footprint of our workloads is highly variable.  With zram,
> > > > we have to set the size of the swap partition at boot time.  When the
> > > > (logical) swap partition is full, we're left with some amount of RAM
> > > > usable by file and anonymous pages (we can ignore the rest).  We don't
> > > > get to control this amount dynamically.  Thus if the workload fits
> > > > nicely in it, everything works well.  If it doesn't, then the rate of
> > > > anonymous page faults can be quite high, causing large CPU overhead
> > > > for compression/decompression (as well as for other parts of the MM).
> > > >
> > > > In Chrome OS and Android, we have the luxury that we can reduce
> > > > pressure by terminating processes (tab discard in Chrome OS, app kill
> > > > in Android---which incidentally also runs in parallel with Chrome OS
> > > > on some chromebooks).  To help decide when to reduce pressure, we
> > > > would like to have a reliable and device-independent measure of MM CPU
> > > > overhead.  I have looked into PSI and have a few questions.  I am also
> > > > looking for alternative suggestions.
> > > >
> > > > PSI measures the times spent when some and all tasks are blocked by
> > > > memory allocation.  In some experiments, this doesn't seem to
> > > > correlate too well with CPU overhead (which instead correlates fairly
> > > > well with page fault rates).  Could this be because it includes
> > > > pressure from file page faults?
> > >
> > > This might be caused by thrashing (see:
> > > https://elixir.bootlin.com/linux/v5.1-rc6/source/mm/filemap.c#L1114).
> > >
> > > >  Is there some way of interpreting PSI
> > > > numbers so that the pressure from file pages is ignored?
> > >
> > > I don't think so but I might be wrong. Notice here
> > > https://elixir.bootlin.com/linux/v5.1-rc6/source/mm/filemap.c#L1111
> > > you could probably use delayacct to distinguish file thrashing,
> > > however remember that PSI takes into account the number of CPUs and
> > > the number of currently non-idle tasks in its pressure calculations,
> > > so the raw delay numbers might not be very useful here.
> >
> > OK.
> >
> > > > What is the purpose of "some" and "full" in the PSI measurements?  The
> > > > chrome browser is a multi-process app and there is a lot of IPC.  When
> > > > process A is blocked on memory allocation, it cannot respond to IPC
> > > > from process B, thus effectively both processes are blocked on
> > > > allocation, but we don't see that.
> > >
> > > I don't think PSI would account such an indirect stall when A is
> > > waiting for B and B is blocked on memory access. B's stall will be
> > > accounted for but I don't think A's blocked time will go into PSI
> > > calculations. The process inter-dependencies are probably out of scope
> > > for PSI.
> >
> > Right, that's what I was also saying.  It would be near impossible to
> > figure it out.  It may also be that statistically it doesn't matter,
> > as long as the workload characteristics don't change dramatically.
> > Which unfortunately they might...
> >
> > > > Also, there are situations in
> > > > which some "uninteresting" process keep running.  So it's not clear we
> > > > can rely on "full".  Or maybe I am misunderstanding?  "Some" may be a
> > > > better measure, but again it doesn't measure indirect blockage.
> > >
> > > Johannes explains the SOME and FULL calculations here:
> > > https://elixir.bootlin.com/linux/v5.1-rc6/source/kernel/sched/psi.c#L76
> > > and includes couple examples with the last one showing FULL>0 and some
> > > tasks still running.
> >
> > Thank you, yes, those are good explanation.  I am still not sure how
> > to use this in our case.
> >
> > I thought about using the page fault rate as a proxy for the
> > allocation overhead.  Unfortunately it is difficult to figure out the
> > baseline, because: 1. it is device-dependent (that's not
> > insurmountable: we could compute a per-device baseline offline); 2.
> > the CPUs can go in and out of turbo mode, or temperature-throttling,
> > and the notion of a constant "baseline" fails miserably.
> >
> > > > The kernel contains various cpustat measurements, including some
> > > > slightly esoteric ones such as CPUTIME_GUEST and CPUTIME_GUEST_NICE.
> > > > Would adding a CPUTIME_MEM be out of the question?
> >
> > Any opinion on CPUTIME_MEM?
>
> I guess some description of how you plan to calculate it would be
> helpful. A simple raw delay counter might not be very useful, that's
> why PSI performs more elaborate calculations.
> Maybe posting a small RFC patch with code would get more attention and
> you can collect more feedback.
>
> > Thanks again!
> >
> > > > Thanks!
> > > >
> > >
> > > Just my 2 cents and Johannes being the author might have more to say here.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux