Re: [PATCH v5 bpf-next 04/11] bpf: Add map side support for bpf timers.

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Thu, 8 Jul 2021 20:52:23 -0700

On Thu, Jul 08, 2021 at 06:51:19PM -0700, Martin KaFai Lau wrote:
> > +
> >  /* Called when map->refcnt goes to zero, either from workqueue or from syscall */
> >  static void array_map_free(struct bpf_map *map)
> >  {
> > @@ -382,6 +402,7 @@ static void array_map_free(struct bpf_map *map)
> >  	if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
> >  		bpf_array_free_percpu(array);
> >  
> > +	array_map_free_timers(map);
> array_map_free() is called when map->refcnt reached 0.
> By then, map->usercnt should have reached 0 before
> and array_map_free_timers() should have already been called,
> so no need to call it here again?  The same goes for hashtab.

Not sure it's that simple.
Currently map->usercnt > 0 check is done for bpf_timer_set_callback only,
because prog refcnting is what matters to usercnt and map_release_uref scheme.
bpf_map_init doesn't have this check because there is no circular dependency
prog->map->timer->prog to worry about.
So after usercnt reached zero the prog can still do bpf_timer_init.
I guess we can add usercnt > 0 to bpf_timer_init as well.
Need to think whether it's enough and the race between atomic64_read(usercnt)
and atomic64_dec_and_test(usercnt) is addressed the same way as the race
in set_callback and cancel_and_free. So far looks like it. Hmm.

> > +static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> > +			  const char *name, int sz, int align)
> > +{
> > +
> > +	if (__btf_type_is_struct(t))
> > +		return btf_find_struct_field(btf, t, name, sz, align);
> > +	else if (btf_type_is_datasec(t))
> > +		return btf_find_datasec_var(btf, t, name, sz, align);
> iiuc, a global struct bpf_timer is not supported.  I am not sure
> why it needs to find the timer in datasec here. or it meant to error out
> and potentially give a verifier log?  I don't see where is the verifier
> reporting error though.

yes. Initially I coded it up to support timers in the global data, but
later Andrii convinced me that single global timer could be surprising to
users unless I hack it up in libbpf to create a bunch of global data maps
one map for each timer and teach libbpf to avoid doing mmap on them.
That's too hacky and can be done later.
So the datasec parsing code stayed only to have a meaningful error
in the verifier if the user wrote a program with global timer.
Without this code the verifier error:
 24: (85) call bpf_timer_init#169
 map 'my_test.bss' is not a struct type or bpf_timer is mangled
With:
 24: (85) call bpf_timer_init#169
 timer pointer in R1 map_uid=0 doesn't match map pointer in R2 map_uid=0
Imo that's much clearer. Since to use global bpf_timer_init() the
user would have to pass some map value in R2
and it wouldn't match the timer in R1.
So I prefer to keep this btf_find_datasec_var() code.

> 
> > +static void htab_free_malloced_timers(struct bpf_htab *htab)
> > +{
> > +	int i;
> > +
> > +	rcu_read_lock();
> > +	for (i = 0; i < htab->n_buckets; i++) {
> > +		struct hlist_nulls_head *head = select_bucket(htab, i);
> > +		struct hlist_nulls_node *n;
> > +		struct htab_elem *l;
> > +
> > +		hlist_nulls_for_each_entry(l, n, head, hash_node)
> need the _rcu() variant here.

argh. right.

> May be put rcu_read_lock/unlock() in the loop and do a
> cond_resched() in case the hashtab is large.

Feels a bit like premature optimization. delete_all_elements()
loop right above is doing similar work without cond_resched.
I don't mind cond_resched. I just don't see how to cleanly add it
without breaking rcu_read_lock and overcomplicating the code.