On 19/11/2024 03:28, Peter Zijlstra wrote:
On Mon, Nov 18, 2024 at 10:52:43AM -0800, Vadim Fedorenko wrote:
+ if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
+ imm32 == BPF_CALL_IMM(bpf_cpu_cycles_to_ns) &&
+ cpu_feature_enabled(X86_FEATURE_CONSTANT_TSC)) {
+ u32 mult, shift;
+
+ clocks_calc_mult_shift(&mult, &shift, tsc_khz, USEC_PER_SEC, 0);
+ /* imul RAX, RDI, mult */
+ maybe_emit_mod(&prog, BPF_REG_1, BPF_REG_0, true);
+ EMIT2_off32(0x69, add_2reg(0xC0, BPF_REG_1, BPF_REG_0),
+ mult);
+
+ /* shr RAX, shift (which is less than 64) */
+ maybe_emit_1mod(&prog, BPF_REG_0, true);
+ EMIT3(0xC1, add_1reg(0xE8, BPF_REG_0), shift);
+
+ break;
+ }
This is ludicrously horrible. Why are you using your own mult/shift and
not offset here instead of using the one from either sched_clock or
clocksource_tsc ?
With X86_FEATURE_CONSTANT_TSC, tsc_khz is actually constant after
switching from tsc_early. And the very same call to
clocks_calc_mult_shift() is used to create clocksource_tsc mult and
shift constants. Unfortunately, clocksources don't have proper API to
get the underlying info, that's why I have to calculate shift and mult
values on my own.
And being totally inconsistent with your own alternative implementation
which uses the VDSO, which in turn uses clocksource_tsc:
With what I said above it is consistent with clocksource_tsc.
+__bpf_kfunc u64 bpf_cpu_cycles_to_ns(u64 cycles)
+{
+ const struct vdso_data *vd = __arch_get_k_vdso_data();
+
+ vd = &vd[CS_RAW];
+ /* kfunc implementation does less manipulations than vDSO
+ * implementation. BPF use-case assumes two measurements are close
+ * in time and can simplify the logic.
+ */
+ return mul_u64_u32_shr(cycles, vd->mult, vd->shift);
+}
Also, if I'm not mistaken, the above is broken, you really should add
the offset, without it I don't think we guarantee the result is
monotonic.
Not quite sure how constant offset can affect monotonic guarantee of
cycles, given that the main use case will be to calculate ns out of
small deltas? AFAIU, the offset is needed to get ns of CLOCK_MONOTONIC,
which can be affected by NTP manipulation. But in this helper we don't
follow any clock_id, we just want to calculate nanoseconds value out of
stable and monotonically increasing counter provided by architecture.