Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes: > On Thu, Jun 03, 2021 at 12:21:05AM +0200, Toke Høiland-Jørgensen wrote: >> Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes: >> >> > On Thu, May 27, 2021 at 06:57:17PM +0200, Toke Høiland-Jørgensen wrote: >> >> > if (val) { >> >> > bpf_timer_init(&val->timer, timer_cb2, 0); >> >> > bpf_timer_start(&val->timer, 1000 /* call timer_cb2 in 1 msec */); >> >> >> >> nit: there are 1M nanoseconds in a millisecond :) >> > >> > oops :) >> > >> >> > } >> >> > } >> >> > >> >> > This patch adds helper implementations that rely on hrtimers >> >> > to call bpf functions as timers expire. >> >> > The following patch adds necessary safety checks. >> >> > >> >> > Only programs with CAP_BPF are allowed to use bpf_timer. >> >> > >> >> > The amount of timers used by the program is constrained by >> >> > the memcg recorded at map creation time. >> >> > >> >> > The bpf_timer_init() helper is receiving hidden 'map' and 'prog' arguments >> >> > supplied by the verifier. The prog pointer is needed to do refcnting of bpf >> >> > program to make sure that program doesn't get freed while timer is armed. >> >> > >> >> > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> >> >> >> >> Overall this LGTM, and I believe it will be usable for my intended use >> >> case. One question: >> >> >> >> With this, it will basically be possible to create a BPF daemon, won't >> >> it? I.e., if a program includes a timer and the callback keeps re-arming >> >> itself this will continue indefinitely even if userspace closes all refs >> >> to maps and programs? Not saying this is a problem, just wanted to check >> >> my understanding (i.e., that there's not some hidden requirement on >> >> userspace keeping a ref open that I'm missing)... >> > >> > That is correct. >> > Another option would be to auto-cancel the timer when the last reference >> > to the prog drops. That may feel safer, since forever >> > running bpf daemon is a certainly new concept. >> > The main benefits of doing prog_refcnt++ from bpf_timer_start are ease >> > of use and no surprises. >> > Disappearing timer callback when prog unloads is outside of bpf prog control. >> > For example the tracing bpf prog might collect some data and periodically >> > flush it to user space. The prog would arm the timer and when callback >> > is invoked it would send the data via ring buffer and start another >> > data collection cycle. >> > When the user space part of the service exits it doesn't have >> > an ability to enforce the flush of the last chunk of data. >> > It could do prog_run cmd that will call the timer callback, >> > but it's racy. >> > The solution to this problem could be __init and __fini >> > sections that will be invoked when the prog is loaded >> > and when the last refcnt drops. >> > It's a complementary feature though. >> > The prog_refcnt++ from bpf_timer_start combined with a prog >> > explicitly doing bpf_timer_cancel from __fini >> > would be the most flexible combination. >> > This way the prog can choose to be a daemon or it can choose >> > to cancel its timers and do data flushing when the last prog >> > reference drops. >> > The prog refcnt would be split (similar to uref). The __fini callback >> > will be invoked when refcnt reaches zero, but all increments >> > done by bpf_timer_start will be counted separately. >> > The user space wouldn't need to do the prog_run command. >> > It would detach the prog and close(prog_fd). >> > That will trigger __fini callback that will cancel the timers >> > and the prog will be fully unloaded. >> > That would make bpf progs resemble kernel modules even more. >> >> I like the idea of a "destructor" that will trigger on refcnt drop to >> zero. And I do think a "bpf daemon" is potentially a useful, if novel, >> concept. > > I think so too. Long ago folks requested periodic bpf progs to > do sampling in tracing. All these years attaching bpf prog > to a perf_event was a workaround for such feature request. > perf_event bpf prog can be pinned in perf_event array, > so "bpf daemon" kinda exist today. Just more convoluted. Right, agreed - triggering periodic sampling directly from BPF does seem like the right direction. >> The __fini thing kinda supposes a well-behaved program, though, right? >> I.e., it would be fairly trivial to write a program that spins forever >> by repeatedly scheduling the timer with a very short interval (whether >> by malice or bugginess). > > It's already possible without bpf_timer. Hmm, fair point. >> So do we need a 'bpfkill' type utility to nuke >> buggy programs, or how would resource constraints be enforced? > > That is possible without 'bpfkill'. > bpftool can delete map element that contains bpf_timer and > that will cancel it. I'll add tests to make sure it's the case. Ah, right, of course! Thanks, LGTM then :) -Toke