Re: [PATCH bpf-next v8 0/4] bpf: add cpu cycles kfuncss

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 26 Nov 2024 10:12:57 -0800

On Fri, Nov 22, 2024 at 3:34 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Wed, Nov 20, 2024 at 04:08:10PM -0800, Vadim Fedorenko wrote:
> > This patchset adds 2 kfuncs to provide a way to precisely measure the
> > time spent running some code. The first patch provides a way to get cpu
> > cycles counter which is used to feed CLOCK_MONOTONIC_RAW. On x86
> > architecture it is effectively rdtsc_ordered() function while on other
> > architectures it falls back to __arch_get_hw_counter(). The second patch
> > adds a kfunc to convert cpu cycles to nanoseconds using shift/mult
> > constants discovered by kernel. The main use-case for this kfunc is to
> > convert deltas of timestamp counter values into nanoseconds. It is not
> > supposed to get CLOCK_MONOTONIC_RAW values as offset part is skipped.
> > JIT version is done for x86 for now, on other architectures it falls
> > back to slightly simplified version of vdso_calc_ns.
>
> So having now read this. I'm still left wondering why you would want to
> do this.
>
> Is this just debug stuff, for when you're doing a poor man's profile
> run? If it is, why do we care about all the precision or the ns. And why
> aren't you using perf?

No, it's not debug stuff. It's meant to be used in production for
measuring durations of whatever is needed. Like uprobe entry/exit
duration, or time between scheduling switches, etc.

Vadim emphasizes benchmarking at scale, but that's a bit misleading.
It's not "benchmarking", it's measuring durations of relevant pairs of
events. In production and at scale, so the unnecessary overhead all
adds up. We'd like to have the minimal possible overhead for this time
passage measurement. And some durations are very brief, so precision
matters as well. And given this is meant to be later used to do
aggregation and comparison across large swaths of production hosts, we
have to have comparable units, which is why nanoseconds and not some
abstract "time cycles".

Does this address your concerns?

>
> Is it something else?
>
> Again, what are you going to do with this information?

There are many specific uses, all of which currently use pairs of
bpf_ktime_get_ns() helper calls, which calls into
ktime_get_mono_fast_ns(). These new kfuncs are meant to be a faster
replacement for 2 x bpf_ktime_get_ns() calls.

The information itself is collected and emitted into a centralized
data storage and querying system used by tons of engineers for
whatever they need. I'm not sure we need to go into specific
individual use cases. There are tons and they all vary. The common
need is to measure real wallclock time passage.