On Mon, May 24, 2021 at 8:16 PM Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote: > > On Sun, May 23, 2021 at 9:01 AM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Fri, May 21, 2021 at 2:37 PM Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote: > > > > > > Hi, Alexei > > > > > > On Thu, May 20, 2021 at 11:52 PM Alexei Starovoitov > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > Introduce 'struct bpf_timer' that can be embedded in most BPF map types > > > > and helpers to operate on it: > > > > long bpf_timer_init(struct bpf_timer *timer, void *callback, int flags) > > > > long bpf_timer_mod(struct bpf_timer *timer, u64 msecs) > > > > long bpf_timer_del(struct bpf_timer *timer) > > > > > > Like we discussed, this approach would make the timer harder > > > to be independent of other eBPF programs, which is a must-have > > > for both of our use cases (mine and Jamal's). Like you explained, > > > this requires at least another program array, a tail call, a mandatory > > > prog pinning to work. > > > > That is simply not true. > > Which part is not true? The above is what I got from your explanation. I tried to write some code sketches to use your timer to implement our conntrack logic, below shows how difficult it is to use, it does not even include the user-space part where eBPF programs are put into the program array. struct { __uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 1000); __type(key, struct tuple); __type(value, struct foo); } conntrack SEC(".maps"); struct map_elem { struct bpf_timer timer; struct bpf_map *target; u32 expires; }; struct { __uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 1000); __type(key, int); __type(value, struct map_elem); } timers SEC(".maps"); struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(key_size, sizeof(u32)); __uint(value_size, sizeof(u32)); __uint(max_entries, 8); } jmp_table SEC(".maps"); static __u64 cleanup_conntrack(struct bpf_map *map, struct tuple *key, struct foo *val, struct callback_ctx *data) { if (val.expires < now) bpf_map_delete_elem(conntrack, key); } static int timer_cb(struct bpf_map *map, int *key, struct map_elem *val) { bpf_for_each_map_elem(val->target, cleanup_conntrack, ....); /* re-arm the timer again to execute after 1 msec */ bpf_timer_mod(&val->timer, 1); return 0; } SEC("prog/0") int install_timer(void) { struct map_elem *val; int key = 0; val = bpf_map_lookup_elem(&timers, &key); if (val) { bpf_timer_init(&val->timer, timer_cb, 0); bpf_timer_mod(&val->timer, val->expires); } } SEC("prog/1") int mod_timer(void) { struct map_elem *val; int key = 0; val = bpf_map_lookup_elem(&timers, &key); if (val) { // XXX: how do we know if a timer has been installed? bpf_timer_mod(&val->timer, val->expires); } } SEC("ingress") void ingress(struct __sk_buff *skb) { struct tuple tuple; // extract tuple from skb if (bpf_map_lookup_elem(&timers, &key) == NULL) bpf_tail_call(NULL, &jmp_table, 0); // here is not reachable unless failure val = bpf_map_lookup_elem(&conntrack, &tuple); if (val && val->expires < now) { bpf_tail_call(NULL, &jmp_table, 1); // here is not reachable unless failure } } SEC("egress") void egress(struct __sk_buff *skb) { struct tuple tuple; // extract tuple from skb if (bpf_map_lookup_elem(&timers, &key) == NULL) bpf_tail_call(NULL, &jmp_table, 0); // here is not reachable unless failure val = bpf_map_lookup_elem(&conntrack, &tuple); if (val && val->expires < now) { bpf_tail_call(NULL, &jmp_table, 1); // here is not reachable unless failure } }