Assume that a CPU time“ A” is read from /proc/stat, and after a while,
a CPU time “B” is read. If T = B – A < 0, T is identified as a large
number as an unsigned integer. As a result, the CPU usage calculated by
this way will be abnormally high. It seems to be a problem to be fixed.
original link:
https://lore.kernel.org/lkml/20220813000102.42051-1-hucool.lihua@xxxxxxxxxx/
在 2022/8/15 16:15, Peter Zijlstra 写道:
On Sat, Aug 13, 2022 at 08:01:02AM +0800, Li Hua wrote:
The problem that the statistical time goes backward, the value read first is 319, and the value read again is 318. As follows:
first:
cat /proc/stat | grep cpu1
cpu1 319 0 496 41665 0 0 0 0 0 0
then:
cat /proc/stat | grep cpu1
cpu1 318 0 497 41674 0 0 0 0 0 0
Time goes back, which is counterintuitive.
After debug this, The problem is caused by the implementation of kcpustat_cpu_fetch_vtime. As follows:
CPU0 CPU1
First:
show_stat():
->kcpustat_cpu_fetch()
->kcpustat_cpu_fetch_vtime()
->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu) + vtime->utime + delta; rq->curr is in user mod
---> When CPU1 rq->curr running on userspace, need add utime and delta
---> rq->curr->vtime->utime is less than 1 tick
Then:
show_stat():
->kcpustat_cpu_fetch()
->kcpustat_cpu_fetch_vtime()
->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu); rq->curr is in kernel mod
---> When CPU1 rq->curr running on kernel space, just got kcpustat
This is unreadable, what?!?
.