Re: [PATCH v1] cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 5/3/24 10:00, Jesper Dangaard Brouer wrote:
I may have mistakenly thinking the lock hold time refers to just the cpu_lock. Your reported times here are about the cgroup_rstat_lock. Right? If so, the numbers make sense to me.


True, my reported number here are about the cgroup_rstat_lock.
Glad to hear, we are more aligned then 🙂

Given I just got some prod machines online with this patch
cgroup_rstat_cpu_lock tracepoints, I can give you some early results,
about hold-time for the cgroup_rstat_cpu_lock.

From this oneliner bpftrace commands:

  sudo bpftrace -e '
         tracepoint:cgroup:cgroup_rstat_cpu_lock_contended {
           @start[tid]=nsecs; @cnt[probe]=count()}
         tracepoint:cgroup:cgroup_rstat_cpu_locked {
           $now=nsecs;
           if (args->contended) {
             @wait_per_cpu_ns=hist($now-@start[tid]); delete(@start[tid]);}
           @cnt[probe]=count(); @locked[tid]=$now}
         tracepoint:cgroup:cgroup_rstat_cpu_unlock {
           $now=nsecs;
           @locked_per_cpu_ns=hist($now-@locked[tid]); delete(@locked[tid]);
           @cnt[probe]=count()}
         interval:s:1 {time("%H:%M:%S "); print(@wait_per_cpu_ns);
           print(@locked_per_cpu_ns); print(@cnt); clear(@cnt);}'

Results from one 1 sec period:

13:39:55 @wait_per_cpu_ns:
[512, 1K)              3 |      |
[1K, 2K)              12 |@      |
[2K, 4K)             390 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4K, 8K)              70 |@@@@@@@@@      |
[8K, 16K)             24 |@@@      |
[16K, 32K)           183 |@@@@@@@@@@@@@@@@@@@@@@@@      |
[32K, 64K)            11 |@      |

@locked_per_cpu_ns:
[256, 512)         75592 |@      |
[512, 1K)        2537357 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1K, 2K)          528615 |@@@@@@@@@@      |
[2K, 4K)          168519 |@@@      |
[4K, 8K)          162039 |@@@      |
[8K, 16K)         100730 |@@      |
[16K, 32K)         42276 |      |
[32K, 64K)          1423 |      |
[64K, 128K)           89 |      |

 @cnt[tracepoint:cgroup:cgroup_rstat_cpu_lock_contended]: 3 /sec
 @cnt[tracepoint:cgroup:cgroup_rstat_cpu_unlock]: 3200  /sec
 @cnt[tracepoint:cgroup:cgroup_rstat_cpu_locked]: 3200  /sec


So, we see "flush-code-path" per-CPU-holding @locked_per_cpu_ns isn't
exceeding 128 usec.

My latency requirements, to avoid RX-queue overflow, with 1024 slots,
running at 25 Gbit/s, is 27.6 usec with small packets, and 500 usec
(0.5ms) with MTU size packets.  This is very close to my latency
requirements.

Thanks for sharing the data.

This is more aligned with what I would have expected. Still, a high up to 128 usec is still on the high side. I remembered during my latency testing when I worked on cpu_lock latency patch, it was in the 2 digit range. Perhaps there are other sources of noise or the update list is really long. Anyway, it may be a bit hard to reach the 27.6 usec target for small packets.

Cheers,
Longman





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux