On Tue, Jan 26, 2021 at 11:00 PM Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote: > > > ret = PTR_ERR(l_new); > > > + if (ret == -EAGAIN) { > > > + htab_unlock_bucket(htab, b, hash, flags); > > > + htab_gc_elem(htab, l_old); > > > + mod_delayed_work(system_unbound_wq, &htab->gc_work, 0); > > > + goto again; > > > > Also this one looks rather worrying, so the BPF prog is stalled here, loop-waiting > > in (e.g. XDP) hot path for system_unbound_wq to kick in to make forward progress? > > In this case, the old one is scheduled for removal in GC, we just wait for GC > to finally remove it. It won't stall unless GC itself or the worker scheduler is > wrong, both of which should be kernel bugs. > > If we don't do this, users would get a -E2BIG when it is not too big. I don't > know a better way to handle this sad situation, maybe returning -EBUSY > to users and let them call again? I think using wq for timers is a non-starter. Tying a hash/lru map with a timer is not a good idea either. I think timers have to be done as independent objects similar to how the kernel uses them. Then there will be no question whether lru or hash map needs it. The bpf prog author will be able to use timers with either. The prog will be able to use timers without hash maps too. I'm proposing a timer map where each object will go through bpf_timer_setup(timer, callback, flags); where "callback" is a bpf subprogram. Corresponding bpf_del_timer and bpf_mod_timer would work the same way they are in the kernel. The tricky part is kernel style of using from_timer() to access the object with additional info. I think bpf timer map can model it the same way. At map creation time the value_size will specify the amount of extra bytes necessary. Another alternative is to pass an extra data argument to a callback.