Re: [PATCH v1] cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints

Waiman Long <longman@xxxxxxxxxx> · Fri, 3 May 2024 10:30:02 -0400

On 5/3/24 10:00, Jesper Dangaard Brouer wrote:
I may have mistakenly thinking the lock hold time refers to just the 
cpu_lock. Your reported times here are about the cgroup_rstat_lock. 
Right? If so, the numbers make sense to me.

True, my reported number here are about the cgroup_rstat_lock.
Glad to hear, we are more aligned then 🙂

Given I just got some prod machines online with this patch
cgroup_rstat_cpu_lock tracepoints, I can give you some early results,
about hold-time for the cgroup_rstat_cpu_lock.

From this oneliner bpftrace commands:

  sudo bpftrace -e '
         tracepoint:cgroup:cgroup_rstat_cpu_lock_contended {
           @start[tid]=nsecs; @cnt[probe]=count()}
         tracepoint:cgroup:cgroup_rstat_cpu_locked {
           $now=nsecs;
           if (args->contended) {
             @wait_per_cpu_ns=hist($now-@start[tid]); 
delete(@start[tid]);}
           @cnt[probe]=count(); @locked[tid]=$now}
         tracepoint:cgroup:cgroup_rstat_cpu_unlock {
           $now=nsecs;
           @locked_per_cpu_ns=hist($now-@locked[tid]); 
delete(@locked[tid]);
           @cnt[probe]=count()}
         interval:s:1 {time("%H:%M:%S "); print(@wait_per_cpu_ns);
           print(@locked_per_cpu_ns); print(@cnt); clear(@cnt);}'

Results from one 1 sec period:

13:39:55 @wait_per_cpu_ns:
[512, 1K)              3 |      |
[1K, 2K)              12 |@      |
[2K, 4K)             390 
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4K, 8K)              70 |@@@@@@@@@      |
[8K, 16K)             24 |@@@      |
[16K, 32K)           183 |@@@@@@@@@@@@@@@@@@@@@@@@      |
[32K, 64K)            11 |@      |

@locked_per_cpu_ns:
[256, 512)         75592 |@      |
[512, 1K)        2537357 
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1K, 2K)          528615 |@@@@@@@@@@      |
[2K, 4K)          168519 |@@@      |
[4K, 8K)          162039 |@@@      |
[8K, 16K)         100730 |@@      |
[16K, 32K)         42276 |      |
[32K, 64K)          1423 |      |
[64K, 128K)           89 |      |

 @cnt[tracepoint:cgroup:cgroup_rstat_cpu_lock_contended]: 3 /sec
 @cnt[tracepoint:cgroup:cgroup_rstat_cpu_unlock]: 3200  /sec
 @cnt[tracepoint:cgroup:cgroup_rstat_cpu_locked]: 3200  /sec

So, we see "flush-code-path" per-CPU-holding @locked_per_cpu_ns isn't
exceeding 128 usec.

My latency requirements, to avoid RX-queue overflow, with 1024 slots,
running at 25 Gbit/s, is 27.6 usec with small packets, and 500 usec
(0.5ms) with MTU size packets.  This is very close to my latency
requirements. 

Thanks for sharing the data.

This is more aligned with what I would have expected. Still, a high up 
to 128 usec is still on the high side. I remembered during my latency 
testing when I worked on cpu_lock latency patch, it was in the 2 digit 
range. Perhaps there are other sources of noise or the update list is 
really long. Anyway, it may be a bit hard to reach the 27.6 usec target 
for small packets.

Cheers,
Longman