On 08/28/2018 10:22 AM, Johannes Weiner wrote: > diff --git a/Documentation/accounting/psi.txt b/Documentation/accounting/psi.txt > new file mode 100644 > index 000000000000..51e7ef14142e > --- /dev/null > +++ b/Documentation/accounting/psi.txt > @@ -0,0 +1,64 @@ > +================================ > +PSI - Pressure Stall Information > +================================ > + > +:Date: April, 2018 > +:Author: Johannes Weiner <hannes@xxxxxxxxxxx> > + > +When CPU, memory or IO devices are contended, workloads experience > +latency spikes, throughput losses, and run the risk of OOM kills. > + > +Without an accurate measure of such contention, users are forced to > +either play it safe and under-utilize their hardware resources, or > +roll the dice and frequently suffer the disruptions resulting from > +excessive overcommit. > + > +The psi feature identifies and quantifies the disruptions caused by > +such resource crunches and the time impact it has on complex workloads > +or even entire systems. > + > +Having an accurate measure of productivity losses caused by resource > +scarcity aids users in sizing workloads to hardware--or provisioning > +hardware according to workload demand. > + > +As psi aggregates this information in realtime, systems can be managed > +dynamically using techniques such as load shedding, migrating jobs to > +other systems or data centers, or strategically pausing or killing low > +priority or restartable batch jobs. > + > +This allows maximizing hardware utilization without sacrificing > +workload health or risking major disruptions such as OOM kills. > + > +Pressure interface > +================== > + > +Pressure information for each resource is exported through the > +respective file in /proc/pressure/ -- cpu, memory, and io. > + Hi, > +In both cases, the format for CPU is as such: I don't see what "In both cases" refers to here. It seems that you could just remove it. > + > +some avg10=0.00 avg60=0.00 avg300=0.00 total=0 > + > +and for memory and IO: > + > +some avg10=0.00 avg60=0.00 avg300=0.00 total=0 > +full avg10=0.00 avg60=0.00 avg300=0.00 total=0 > + > +The "some" line indicates the share of time in which at least some > +tasks are stalled on a given resource. -- ~Randy