Re: [External] Re: [PATCH] cgroup/rstat: record the cumulative per-cpu time of cgroup and its descendants

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2023/8/3 Tejun Heo wrote:
I couldn't come up with an answer. Let's go ahead with adding the field but
can you please do the followings?


Thank you for your suggestion. I am very willing to do it.

* Name it to something like subtree_bstat instead of cumul_bstat. The
   counters are all cumulative.

I did it in v2 patch.


* Are you sure the upward propagation logic is correct? It's calculating
   global delta and then propagating to the per-cpu delta of the parent. Is
   that correct because the two delta calculations always end up the same?

Sorry, I made a mistake and misled you. These two deltas are not always equal.

I found and reproduced a bug case:
We build /sys/fs/cgroup/test /sys/fs/cgroup/test/t1, /sys/fs/cgroup/test/t1/tt1 in turn. And there are only tasks in /sys/fs/cgroup/test/t1/tt1. After applying this patch, some operations and corresponding data changes are as follows:

Step 1、cat /sys/fs/cgroup/test/t1/tt1/cpu.stat
*cpu 6* current flush cgroup /test/t1/tt1
per-cpu delta utime 0 stime 0 sum_exec_runtime 257341
/test/t1/tt1 cumul_bstat(cpu 6) utime 0 stime 0 sum_exec_runtime 257341
/test/t1/tt1 cgrp->bstat utime 0 stime 0 sum_exec_runtime 257341
if (cgroup_parent(parent)) {
    cgrp delta utime 0 stime 0 sum_exec_runtime 257341
    parent(/test/t1) ->bstat utime 0 stime 0 sum_exec_runtime 257341
    parent(/test/t1) last_bstat utime 0 stime 0 sum_exec_runtime 0
    parent(/test/t1) cumul_bstat utime 0 stime 0 sum_exec_runtime 257341
}


Step 2、cat /sys/fs/cgroup/test/t1/tt1/cpu.stat
*cpu 12* current flush cgroup /test/t1/tt1
per-cpu delta utime 0 stime 1000000 sum_exec_runtime 747042
/test/t1/tt1 cumul_bstat utime 0 stime 1000000 sum_exec_runtime 747042
/test/t1/tt1 cgrp->bstat utime 0 stime 1000000 sum_exec_runtime 1004383
if (cgroup_parent(parent)) {
    cgrp delta utime 0 stime 1000000 sum_exec_runtime 747042
    parent(/test/t1) ->bstat utime 0 stime 1000000 sum_exec_runtime 1004383
    parent(/test/t1) last_bstat utime 0 stime 0 sum_exec_runtime 0
parent(/test/t1) cumul_bstat utime 0 stime 1000000 sum_exec_runtime 747042
}


Step 3、cat /sys/fs/cgroup/test/cpu.stat
(cgroup fulsh /test/t1/tt1 -> /test/t1 -> /test in turn)

*cpu 6*  current flush cgroup /test/t1/tt1
per-cpu delta utime 0 stime 0 sum_exec_runtime 263468
/test/t1/tt1 cumul_bstat(cpu 6) utime 0 stime 0 sum_exec_runtime 520809
/test/t1/tt1 cgrp->bstat utime 0 stime 1000000 sum_exec_runtime 1267851
if (cgroup_parent(parent)) {
    cgrp delta utime 0 stime 0 sum_exec_runtime 263468
    parent(/test/t1) ->bstat utime 0 stime 1000000 sum_exec_runtime 1267851
    parent(/test/t1) last_bstat utime 0 stime 0 sum_exec_runtime 0
    parent(/test/t1) cumul_bstat utime 0 stime 0 sum_exec_runtime 520809
}

*cpu 6* current flush cgroup /test/t1
per-cpu delta utime 0 stime 0 sum_exec_runtime 0
/test/t1 cumul_bstat(cpu 6) utime 0 stime 0 sum_exec_runtime 520809
/test/t1 cgrp->bstat utime 0 stime 1000000 sum_exec_runtime 1267851
if (cgroup_parent(parent)) {
    cgrp delta utime 0 stime 1000000 sum_exec_runtime 1267851  <---
    parent(/test) ->bstat utime 0 stime 1000000 sum_exec_runtime 1267851
    parent(/test) last_bstat utime 0 stime 0 sum_exec_runtime 0
parent(/test) cumul_bstat (cpu 6) utime 0 stime 1000000 sum_exec_runtime 1267851 <--- *error*
******
  Here cgrp delta is *not equal* to per-cpu delta.
The frequency of cgroup (/test) and its chiled cgroup (/test/t1/tt1) flush is inconsistent. In other words (when we call cgroup_base_stat_flush(), we will not necessarily flush to the highest-level cgroup except root(like step 1 and 2 above)). Therefore, cgrp delta may contain the cumulative value of multiple per-cpu deltas.

The correct value of parent(/test) cumul_bstat should be utime 0 stime 0 sum_exec_runtime 520809.
******
}

*cpu 6* current flush cgroup /test
per-cpu delta utime 0 stime 0 sum_exec_runtime 0
cumul_bstat utime 0 stime 1000000 sum_exec_runtime 1267851
/test ->bstat utime 0 stime 1000000 sum_exec_runtime 1267851
	cgroup_parent(parent) is NULL end.


So we should add a per-cpu variable subtree_last_bstat similar to cgrp->last_bstat to record the last value.

I have sent v2 patch, please review it again.
v2 link:
https://lore.kernel.org/all/20230807032930.87785-1-jiahao.os@xxxxxxxxxxxxx


* Please add a comment explaining that the field is not currently used
   outside of being read from bpf / drgn and what not and that we're still
   trying to determine how to expose that in the cgroupfs interface.


Thanks, I did it in v2 patch.

Thanks,
Hao



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux