On Tue, Mar 15, 2022 at 10:08:57PM +0800, Jui-Tse Huang wrote: > The load average is one of a common as well as easy observed statistic > provied by Linux, but still not well documented, which makes the numbers I'm really not sure what the target audience is; and I hate you're making me read rst garbage. For someone trying to make changes to load_avg.c this document is pure confusion, for a user trying to make sense of that number, I would imagine much the same. > that users observes from the output of top, htop or other system > monitoring application are only numbers. This patch gives a discussion > on how Linux calculates the load average as well as what metrics are > concerned while calculating the load average. > > The discussion flow is divided into several parts: > 1. The expression used to get the load average. > 2. Why Linux choose such average method from the other. > 2. The meaning of each term in the expression. > 3. The metrics, that is, the type of tasks that will be covered in the > calculation. > 4. A brief explanation over the fixed-point nubmer since the weights > defined in the Linux kernel are based on it. > > Signed-off-by: Jui-Tse Huang <juitse.huang@xxxxxxxxx> > Signed-off-by: Yiwei Lin <s921975628@xxxxxxxxx> > Signed-off-by: Ching-Chun (Jim) Huang <jserv@xxxxxxxxxxxxxxxx> > Co-Developed-by: Yiwei Lin <s921975628@xxxxxxxxx> > > --- > > v3: > - Fix typo (Randy Dunlap) > - Add further reading that links to Brendan Gregg's blog > > v2: > - Fix typo (Chun-Hung Tseng) > > Documentation/scheduler/index.rst | 1 + > Documentation/scheduler/load-average.rst | 82 ++++++++++++++++++++++++ > 2 files changed, 83 insertions(+) > create mode 100644 Documentation/scheduler/load-average.rst > > diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst > index 88900aabdbf7..bdc779b4190f 100644 > --- a/Documentation/scheduler/index.rst > +++ b/Documentation/scheduler/index.rst > @@ -17,6 +17,7 @@ Linux Scheduler > sched-nice-design > sched-rt-group > sched-stats > + load-average > > text_files > > diff --git a/Documentation/scheduler/load-average.rst b/Documentation/scheduler/load-average.rst > new file mode 100644 > index 000000000000..27ce6cbae5f4 > --- /dev/null > +++ b/Documentation/scheduler/load-average.rst > @@ -0,0 +1,82 @@ > +============ > +Load Average > +============ > + > +The load average, provided by common operating systems, indicates the average > +number of system load over a period of time. In Linux, it shows the average > +number of tasks running and waiting for CPU time. The following expression is > +used in Linux to update the load average:: > + > + / 0 , if t = 0 > + load_{t} = | > + \ load_{t - 1} * exp + active * (1 - exp), otherwise Easier to follow is: load_{0} = 0 load_{t+1} = load_{t} * exp + active * (1 - exp) > +The expression represents the exponential moving average of the historical > +loading of the system. There are several reasons that Linux kernel chooses > +exponential moving average from other similar average equations such as simple > +moving average or cumulative moving average: > + > +#. The exponential moving average consumes fixed memory space, while the simple > + moving average has O(n) space complexity where n is the number of timeslices > + within a given interval. > +#. The exponential moving average not only applies a higher weight to the most > + recent record but also declines the weight exponentially, which makes the > + resulting load average reflect the situation of the current system. Neither > + the simple moving average nor cumulative moving average has this feature. > + > +In the expression, the load_{t} indicates the calculated load average at the > +given time t. > +The active is the most recent recorded system load. In Linux, the system load That's inaccurate at best, active is not the load, since you just gave a definition of load. As such, load is a function of time and active. Also, stop saying in Linux, you're reading the Linux documentation this is a given. > +means the number of tasks in the state of TASK_RUNNING or TASK_UNINTERRUPTIBLE > +of the entire system. Tasks with TASK_UNINTERRUPTIBLE state are usually waiting > +for disk I/O or holding an uninterruptible lock, which is considered as a part > +of system resource, thus, Linux kernel covers them while calculating the load > +average. This is inacurate, consider TASK_NOLOAD. > +The exp means the weight applied to the previous report of load average, while > +(1 - exp) is the weight applied to the most recently recorded system load. I get really itchy from statements like this; either you can read a formula or you can't, stuff like this doesn't help much in either case. > +There are three different weights defined in the Linux kernel, in > +include/linux/sched/loadavg.h, to perform statistics in various timescales:: > + > + // include/linux/sched/loadavg.h > + ... > + #define EXP_1 1884 /* 1/exp(5sec/1min) as fixed-point */ > + #define EXP_5 2014 /* 1/exp(5sec/5min) */ > + #define EXP_15 2037 /* 1/exp(5sec/15min) */ > + ... > + > +According to the expression shown on the top of this page, the weight (exp) > +controls how much of the last load load_{t - 1} will take place in the > +calculation of current load, while (1 - exp) is the weight applied to the most > +recent record of system load active. What page, this is a non-paginated document. Also, you're repeating yourself. > +Due to the security issue, the weights are defined as fixed-point numbers based This is complete nonsense > +on the unsigned integer rather than floating-pointing numbers. The introduction > +of the fixed-point number keeps the FPU away from the calculation process. Since > +the precision of the fixed-point used in the Linux kernel is 11 bits, a > +fixed-point can be converted to a floating-point by dividing it by 2048, as in > +the expressions shown bellow:: > + > + EXP_1 = 1884 / 2048 = 0.919922 > + EXP_5 = 2014 / 2048 = 0.983398 > + EXP_15 = 2037 / 2048 = 0.994629 > + > +Which indicates the weights applied to active are:: > + > + (1 - EXP_1) = (1 - 0.919922) = 0.080078 > + (1 - EXP_5) = (1 - 0.983398) = 0.016602 > + (1 - EXP_15) = (1 - 0.994629) = 0.005371 I don't think this is the place to explain fixed point arithmetic. The consumer of load_avg doesn't need the know, the developer looking at loadavg.c will have *MUCH* bigger problems. > +The load average will be updated every 5 seconds. Each time the scheduler_tick() > +be called, the function calc_global_load_tick() will also be invoked, which > +makes the active of each CPU core be calculated and be merged globally. Finally, > +the load average will be updated with that global active. That's wishful thinking, have you read loadavg.c ? > + > +As a user, the load average can be observed via top, htop, or other system > +monitor application, or more directly, by the following command:: > + > + $ cat /proc/loadavg > + > +Further Reading > +--------------- > +The explanation and analysis done by Brendan Gregg on `his blog > +<https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html>`_. That blogpost is actually useful, unlike most of what you've written here. Why not only link that and leave out the rest?