On Fri, Apr 2, 2021 at 12:28 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Wed, Mar 31, 2021 at 09:26:35PM -0700, Cong Wang wrote: > > > This patch introduces a bpf timer map and a syscall to create bpf timer > > from user-space. > > That will severely limit timer api usability. > I agree with Song here. If user space has to create it there is no reason > to introduce new sys_bpf command. Just do all timers in user space > and trigger bpf prog via bpf_prog_test_run cmd. I think you misunderstand my point, the reason why the creation is done in user-space is not I like it, it is because it looks impossible to create timers in kernel-space, hence I have to move it to user-space. > > > > > The reason why we have to use a map is because the lifetime of a timer, > > without a map, we have to delete the timer before exiting the eBPF program, > > this would significately limit its use cases. With a map, the timer can > > stay as long as the map itself and can be actually updated via map update > > API's too, > > this part is correct. > > > where the key is the timer ID and the value is the timer expire > > timer. > > The timer ID is unnecessary. We cannot introduce new IDR for every new > bpf object. It doesn't scale. The IDR is per map, not per timer. > > > Timer creation is not easy either. In order to prevent users creating a > > timer but not adding it to a map, we have to enforce this in the API which > > takes a map parameter and adds the new timer into the map in one shot. > > Not quite true. The timer memory should be a part of the map otherwise > the timer life time is hard to track. But arming the timer and initializing > it with a callback doesn't need to be tied with allocation of timer memory. > > > And because timer is asynchronous, we can not just use its callback like > > bpf_for_each_map_elem(). > > Not quite. We can do it the same way as bpf_for_each_map_elem() despite > being async. Well, at least bpf_for_each_map_elem() can use stack variables etc., but timers can't. They are very different to me. > > > More importantly, we have to properly reference > > count its struct bpf_prog too. > > It's true that callback prog or subprog has to stay alive while timer > is alive. > Traditional maps can live past the time of the progs that use them. > Like bpf prog can load with a pointer to already created hash map. > Then prog can unload and hashmap will stay around just fine. > All maps are like this with the exception of prog_array. > The progs get deleted from the prog_array map when appropriate. > The same thing can work for maps with embedded timers. > For example the subprog/prog can to be deleted from the timer if > that prog is going away. Similar to ref/uref distinction we have for prog_array. > > > It seems impossible to do this either in > > verifier or in JIT, so we have to make its callback code a separate eBPF > > program and pass a program fd from user-space. Fortunately, timer callback > > can still live in the same object file with the rest eBPF code and share > > data too. > > > > Here is a quick demo of the timer callback code: > > > > static __u64 > > check_expired_elem(struct bpf_map *map, __u32 *key, __u64 *val, > > int *data) > > { > > u64 expires = *val; > > > > if (expires < bpf_jiffies64()) { > > bpf_map_delete_elem(map, key); > > *data++; > > } > > return 0; > > } > > > > SEC("timer") > > u32 timer_callback(void) > > { > > int count = 0; > > > > bpf_for_each_map_elem(&map, check_expired_elem, &count, 0); > > if (count) > > return 0; // not re-arm this timer > > else > > return 10; // reschedule this timeGr after 10 jiffies > > } > > As Song pointed out the exact same thing can be done with timers in user space > and user space triggering prog exec with bpf_prog_test_run. > > Here is how more general timers might look like: > https://lore.kernel.org/bpf/20210310011905.ozz4xahpkqbfkkvd@xxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > include/uapi/linux/bpf.h: > struct bpf_timer { > u64 opaque; > }; > The 'opaque' field contains a pointer to dynamically allocated struct timer_list and other data. This is my initial design as we already discussed, it does not work, please see below. > > The prog would do: > struct map_elem { > int stuff; > struct bpf_timer timer; > }; > > struct { > __uint(type, BPF_MAP_TYPE_HASH); > __uint(max_entries, 1); > __type(key, int); > __type(value, struct map_elem); > } hmap SEC(".maps"); > > static int timer_cb(struct map_elem *elem) > { > if (whatever && elem->stuff) > bpf_timer_mod(&elem->timer, new_expire); > } > > int bpf_timer_test(...) > { > struct map_elem *val; > > val = bpf_map_lookup_elem(&hmap, &key); > if (val) { > bpf_timer_init(&val->timer, timer_cb, flags); > val->stuff = 123; > bpf_timer_mod(&val->timer, expires); > } > } > > bpf_map_update_elem() either from bpf prog or from user space > allocates map element and zeros 8 byte space for the timer pointer. > bpf_timer_init() allocates timer_list and stores it into opaque if opaque == 0. > The validation of timer_cb() is done by the verifier. > bpf_map_delete_elem() either from bpf prog or from user space > does del_timer() if elem->opaque != 0. > If prog refers such hmap as above during prog free the kernel does > for_each_map_elem {if (elem->opaque) del_timer().} > I think that is the simplest way of prevent timers firing past the prog life time. > There could be other ways to solve it (like prog_array and ref/uref). > > Pseudo code: > int bpf_timer_init(struct bpf_timer *timer, void *timer_cb, int flags) > { > if (timer->opaque) > return -EBUSY; > t = alloc timer_list > t->cb = timer_cb; > t->.. > timer->opaque = (long)t; > } > > int bpf_timer_mod(struct bpf_timer *timer, u64 expires) > { > if (!time->opaque) > return -EINVAL; > t = (struct timer_list *)timer->opaque; > mod_timer(t,..); > } > > int bpf_timer_del(struct bpf_timer *timer) > { > if (!time->opaque) > return -EINVAL; > t = (struct timer_list *)timer->opaque; > del_timer(t); > } > > The verifier would need to check that 8 bytes occupied by bpf_timer and not accessed > via load/store by the program. The same way it does it for bpf_spin_lock. This does not work, because bpf_timer_del() has to be matched with bpf_timer_init(), otherwise we would leak timer resources. For example: SEC("foo") bad_ebpf_code() { struct bpf_timer t; bpf_timer_init(&t, ...); // allocate a timer bpf_timer_mod(&t, ..); // end of BPF program // now the timer is leaked, no one will delete it } We can not enforce the matching in the verifier, because users would have to call bpf_timer_del() before exiting, which is not what we want either. Thanks.