Hi Luigi, On Tue, Apr 23, 2019 at 11:58 AM Luigi Semenzato <semenzato@xxxxxxxxxx> wrote: > > I and others are working on improving system behavior under memory > pressure on Chrome OS. We use zram, which swaps to a > statically-configured compressed RAM disk. One challenge that we have > is that the footprint of our workloads is highly variable. With zram, > we have to set the size of the swap partition at boot time. When the > (logical) swap partition is full, we're left with some amount of RAM > usable by file and anonymous pages (we can ignore the rest). We don't > get to control this amount dynamically. Thus if the workload fits > nicely in it, everything works well. If it doesn't, then the rate of > anonymous page faults can be quite high, causing large CPU overhead > for compression/decompression (as well as for other parts of the MM). > > In Chrome OS and Android, we have the luxury that we can reduce > pressure by terminating processes (tab discard in Chrome OS, app kill > in Android---which incidentally also runs in parallel with Chrome OS > on some chromebooks). To help decide when to reduce pressure, we > would like to have a reliable and device-independent measure of MM CPU > overhead. I have looked into PSI and have a few questions. I am also > looking for alternative suggestions. > > PSI measures the times spent when some and all tasks are blocked by > memory allocation. In some experiments, this doesn't seem to > correlate too well with CPU overhead (which instead correlates fairly > well with page fault rates). Could this be because it includes > pressure from file page faults? This might be caused by thrashing (see: https://elixir.bootlin.com/linux/v5.1-rc6/source/mm/filemap.c#L1114). > Is there some way of interpreting PSI > numbers so that the pressure from file pages is ignored? I don't think so but I might be wrong. Notice here https://elixir.bootlin.com/linux/v5.1-rc6/source/mm/filemap.c#L1111 you could probably use delayacct to distinguish file thrashing, however remember that PSI takes into account the number of CPUs and the number of currently non-idle tasks in its pressure calculations, so the raw delay numbers might not be very useful here. > What is the purpose of "some" and "full" in the PSI measurements? The > chrome browser is a multi-process app and there is a lot of IPC. When > process A is blocked on memory allocation, it cannot respond to IPC > from process B, thus effectively both processes are blocked on > allocation, but we don't see that. I don't think PSI would account such an indirect stall when A is waiting for B and B is blocked on memory access. B's stall will be accounted for but I don't think A's blocked time will go into PSI calculations. The process inter-dependencies are probably out of scope for PSI. > Also, there are situations in > which some "uninteresting" process keep running. So it's not clear we > can rely on "full". Or maybe I am misunderstanding? "Some" may be a > better measure, but again it doesn't measure indirect blockage. Johannes explains the SOME and FULL calculations here: https://elixir.bootlin.com/linux/v5.1-rc6/source/kernel/sched/psi.c#L76 and includes couple examples with the last one showing FULL>0 and some tasks still running. > The kernel contains various cpustat measurements, including some > slightly esoteric ones such as CPUTIME_GUEST and CPUTIME_GUEST_NICE. > Would adding a CPUTIME_MEM be out of the question? > > Thanks! > Just my 2 cents and Johannes being the author might have more to say here.