Re: [RFC Patch bpf-next] bpf: introduce bpf timer

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 27 Apr 2021 11:33:59 -0700

On Tue, Apr 27, 2021 at 9:36 AM Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
>
> If we enforce this ownership, in case of conntrack the owner would be
> the program which sees the connection first, which is pretty much
> unpredictable. For example, if the ingress program sees a connection
> first, it installs a timer for this connection, but the traffic is
> bidirectional,
> hence egress program needs this connection and its timer too, we
> should not remove this timer when the ingress program is freed.

Sure. That's trivially achieved with pinning.
One can have an ingress prog that tailcalls into another prog
that arms the timer with one of its subprogs.
Egress prog can tailcall into the same prog as well.
The ingress and egress progs can be replaced one by one
or removed both together and middle prog can stay alive
if it's pinned in bpffs or held alive by FD.

> From another point of view: maps and programs are both first-class
> resources in eBPF, a timer is stored in a map and associated with a
> program, so it is naturally a first-class resource too.

Not really. The timer abstraction is about data. It invokes the callback.
That callback is a part of the program. The lifetime of the timer object
and lifetime of the callback can be different.
Obviously the timer logic need to make sure that callback text is alive
when the timer is armed.
Combining timer and callback concepts creates a messy abstraction.
In the normal kernel code one can have a timer in any kernel data
structure and callback in the kernel text or in the kernel module.
The code needs to make sure that the module won't go away while
the timer is armed. Same thing with bpf progs. The progs are safe
kernel modules. The timers are independent objects.

> >
> > > >
> > > > Also if your colleagues have something to share they should be
> > > > posting to the mailing list. Right now you're acting as a broken phone
> > > > passing info back and forth and the knowledge gets lost.
> > > > Please ask your colleagues to participate online.
> > >
> > > They are already in CC from the very beginning. And our use case is
> > > public, it is Cilium conntrack:
> > > https://github.com/cilium/cilium/blob/master/bpf/lib/conntrack.h
> > >
> > > The entries of the code are:
> > > https://github.com/cilium/cilium/blob/master/bpf/bpf_lxc.c
> > >
> > > The maps for conntrack are:
> > > https://github.com/cilium/cilium/blob/master/bpf/lib/conntrack_map.h
> >
> > If that's the only goal then kernel timers are not needed.
> > cilium conntrack works well as-is.
>
> We don't go back to why user-space cleanup is inefficient again,
> do we? ;)

I remain unconvinced that cilium conntrack _needs_ timer apis.
It works fine in production and I don't hear any complaints
from cilium users. So 'user space cleanup inefficiencies' is
very subjective and cannot be the reason to add timer apis.

> More importantly, although conntrack is our use case, we don't
> design timers just for our case, obviously. Timers must be as flexible
> to use as possible, to allow other future use cases.

Right. That's why I'm asking for an explanation of a specific use case.
"we want to do cilium conntrack but differently" is not a reason.