Re: [PATCH bpf-next v9 1/4] bpf: add bpf_get_cpu_time_counter kfunc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01.12.2024 12:46, Thomas Gleixner wrote:
On Fri, Nov 22 2024 at 16:58, Vadim Fedorenko wrote:
diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index de0f9e5f9f73..a549aea25f5f 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -2094,6 +2094,13 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
  			if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
  				int err;
+ if (imm32 == BPF_CALL_IMM(bpf_get_cpu_time_counter)) {
+					if (cpu_feature_enabled(X86_FEATURE_LFENCE_RDTSC))
+						EMIT3(0x0F, 0xAE, 0xE8);
+					EMIT2(0x0F, 0x31);

What guarantees that RDTSC is supported by the CPU?

Well, technically it may be a problem on x86_32 because there are x86 compatible
platforms which don't have RDTSC, but they are almost 16+ years old, and I'm not
quite sure we expose vDSO on such platforms.


Aside of that, if you want the read to be ordered, then you need to take
RDTSCP into account too.

Yes, we have already had this discussion. RDTSCP has the same ordering
guaranties as "LFENCE; RDTSC" according to the programming manuals. But it also
provides "cookie" value, which is not used in this case and just trashes the
value of ECX. To avoid additional register manipulation, I used lfence option.

+#if IS_ENABLED(CONFIG_GENERIC_GETTIMEOFDAY)
+__bpf_kfunc u64 bpf_get_cpu_time_counter(void)
+{
+	const struct vdso_data *vd = __arch_get_k_vdso_data();
+
+	vd = &vd[CS_RAW];
+
+	/* CS_RAW clock_mode translates to VDSO_CLOCKMODE_TSC on x86 and

How so?

vd->clock_mode is not guaranteed to be VDSO_CLOCKMODE_TSC or
VDSO_CLOCKMODE_ARCHTIMER. CS_RAW is the access to the raw (uncorrected)
time of the current clocksource. If the clock mode is not matching, then
you cannot access it.

That's more about x86 and virtualization options. But in the end all this ends
up in reading tsc value. And we do JIT anyway, so this function call will never
be executed on x86. Other architectures (well, apart from MIPS) don't care about
vd->clock_mode at all. And we don't provide kfuncs for architectures without JIT

For MIPS I think I can ifdef these new kfuncs to the case when CONFIG_CSRC_R4K
is not defined.

I'm going to create a patchset to implement arch-specific replacements for all
architectures supported by BPF JIT, so in the end this call will be effectively
not executed.


+	 * to VDSO_CLOCKMODE_ARCHTIMER on aarch64/risc-v. We cannot use
+	 * vd->clock_mode directly because it brings possible access to
+	 * pages visible by user-space only via vDSO.

How so? vd->clock_mode is kernel visible.

vd->clock_mode is kernel visible, but compiler cannot optimize out code which
accesses user-space pages if I don't provide constant value here.


        * But the constant value
+	 * of 1 is exactly what we need - it works for any architecture and
+	 * translates to reading of HW timecounter regardles of architecture.

It does not. Care to look at MIPS?

Yes, this is pretty much specific. But again, the goal is to have JIT
implementation for all architectures and this func will actually be never called
this way.


+	 * We still have to provide vdso_data for some architectures to avoid
+	 * NULL pointer dereference.
+	 */
+	return __arch_get_hw_counter(1, vd);

This is outright dangerous. __arch_get_hw_counter() is for VDSO usage
and not for in kernel usage. What guarantees you that the architecture
specific implementation does not need access to user only mappings.

Aside of that what guarantees that '1' is what you want and stays that
way forever? It's already broken on MIPS.

I can ifdef MIPS case until we have JIT for it (which has pretty much straightforward implementation for HW counter)


Thanks,

         tglx





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux