On Tue, Jun 7, 2022 at 12:35 PM Jesper Dangaard Brouer <jbrouer@xxxxxxxxxx> wrote: > > > On 07/06/2022 11.14, Thomas Gleixner wrote: > > Alexei, > > > > On Mon, Jun 06 2022 at 08:57, Alexei Starovoitov wrote: > >> On Mon, Jun 6, 2022 at 3:38 AM Kurt Kanzenbach <kurt@xxxxxxxxxxxxx> wrote: > >>> > >>> From: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> > >>> > >>> Commit 3dc6ffae2da2 ("timekeeping: Introduce fast accessor to clock tai") > >>> introduced a fast and NMI-safe accessor for CLOCK_TAI. Especially in time > >>> sensitive networks (TSN), where all nodes are synchronized by Precision Time > >>> Protocol (PTP), it's helpful to have the possibility to generate timestamps > >>> based on CLOCK_TAI instead of CLOCK_MONOTONIC. With a BPF helper for TAI in > >>> place, it becomes very convenient to correlate activity across different > >>> machines in the network. > >> > >> That's a fresh feature. It feels risky to bake it into uapi already. > > > > What? That's just support for a different CLOCK. What's so risky about > > it? > > I didn't think it was "risky" as this is already exported as: > EXPORT_SYMBOL_GPL(ktime_get_tai_fast_ns); > > Correct me if I'm wrong, but this simple gives BPF access to CLOCK_TAI > (see man clock_gettime(2)), right? > And CLOCK_TAI is not really a new/fresh type of CLOCK. > > Especially for networking we need this CLOCK_TAI time as HW LaunchTime > need this (e.g. see qdisc's sch_etf.c and sch_taprio.c). I see. I interpreted the commit log that commit 3dc6ffae2da2 introduced TAI into the kernel for the first time. But it introduced the NMI safe version of it, right? > > > >> imo it would be better to annotate tk_core variable in vmlinux BTF. > >> Then progs will be able to read all possible timekeeper offsets. > > > > We are exposing APIs. APIs can be supported, but exposing implementation > > details creates ABIs of the worst sort because that prevents the kernel > > from changing the implementation. We've seen the fallout with the recent > > tracepoint changes already. > > Hmm... annotate tk_core variable in vmlinux BTF and letting BPF progs > access this seems like an unsafe approach and we tempt BPF-developers to > think other parts are okay to access. It is safe to access. Whether garbage will be read it's a different story. The following works (with lose definition of 'works'): extern const void tk_core __ksym; struct timekeeper { long long offs_tai; } __attribute__((preserve_access_index)); struct seqcount_raw_spinlock { } __attribute__((preserve_access_index)); long get_clock_tai(void) { long long off = 0; void *addr = (void *)&tk_core + ((bpf_core_type_size(struct seqcount_raw_spinlock) + 7) / 8) * 8 + bpf_core_field_offset(struct timekeeper, offs_tai); bpf_probe_read_kernel(&off, sizeof(off), addr); return bpf_ktime_get_ns() + off; } It's ugly, but no kernel changes are necessary. If you need to access clock_tai you can do so on older kernels too. Even on android 4.19 kernels. It's not foolproof. Future kernel changes will break it, but CO-RE will detect the breakage. The prog author would have to adjust the prog every time. People do these kinds of tricks all the time. Note that above was possible even before CO-RE came to existence. The first day bpf_probe_read_kernel() became available 8 years ago the tracing progs could read any kernel memory. Even earlier kprobes could read it for 10+ years. And in all those years bpf progs accessing kernel internals didn't freeze kernel development. bpf progs don't extend uapi surface unless the changes are in uapi/bpf.h. Anyway I guess new helper bpf_ktime_get_tai_ns() is ok, since it's so trivial, but selftest is necessary.