Re: [PATCH -next] sched/cputime: Fix the bug of reading time backward from /proc/stat

zhengzucheng <zhengzucheng@xxxxxxxxxx> · Mon, 5 Sep 2022 11:47:41 +0800

Assume that a CPU time“ A” is read from /proc/stat， and after a while,  
a CPU time “B” is read. If T = B – A < 0, T is identified as a large 
number as an unsigned integer. As a result, the CPU usage calculated by 
this way will be abnormally high. It seems to be a problem to be fixed.

original link:
https://lore.kernel.org/lkml/20220813000102.42051-1-hucool.lihua@xxxxxxxxxx/

在 2022/8/15 16:15, Peter Zijlstra 写道:
On Sat, Aug 13, 2022 at 08:01:02AM +0800, Li Hua wrote:
The problem that the statistical time goes backward, the value read first is 319, and the value read again is 318. As follows：
first：
cat /proc/stat |  grep cpu1
cpu1    319    0    496    41665    0    0    0    0    0    0
then：
cat /proc/stat |  grep cpu1
cpu1    318    0    497    41674    0    0    0    0    0    0

Time goes back, which is counterintuitive.

After debug this, The problem is caused by the implementation of kcpustat_cpu_fetch_vtime. As follows：

                               CPU0                                                                          CPU1
First:
show_stat():
     ->kcpustat_cpu_fetch()
         ->kcpustat_cpu_fetch_vtime()
             ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu) + vtime->utime + delta;              rq->curr is in user mod
              ---> When CPU1 rq->curr running on userspace, need add utime and delta
                                                                                              --->  rq->curr->vtime->utime is less than 1 tick
Then:
show_stat():
     ->kcpustat_cpu_fetch()
         ->kcpustat_cpu_fetch_vtime()
             ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu);                                     rq->curr is in kernel mod
             ---> When CPU1 rq->curr running on kernel space, just got kcpustat
This is unreadable, what?!?
.