On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > A destructor kfunc can be defined as void func(type *), where type may > be void or any other pointer type as per convenience. > > In this patch, we ensure that the type is sane and capture the function > pointer into off_desc of ptr_off_tab for the specific pointer offset, > with the invariant that the dtor pointer is always set when 'kptr_ref' > tag is applied to the pointer's pointee type, which is indicated by the > flag BPF_MAP_VALUE_OFF_F_REF. > > Note that only BTF IDs whose destructor kfunc is registered, thus become > the allowed BTF IDs for embedding as referenced kptr. Hence it serves > the purpose of finding dtor kfunc BTF ID, as well acting as a check > against the whitelist of allowed BTF IDs for this purpose. > > Finally, wire up the actual freeing of the referenced pointer if any at > all available offsets, so that no references are leaked after the BPF > map goes away and the BPF program previously moved the ownership a > referenced pointer into it. > > The behavior is similar to BPF timers, where bpf_map_{update,delete}_elem > will free any existing referenced kptr. The same case is with LRU map's > bpf_lru_push_free/htab_lru_push_free functions, which are extended to > reset unreferenced and free referenced kptr. > > Note that unlike BPF timers, kptr is not reset or freed when map uref > drops to zero. > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> > --- > include/linux/bpf.h | 4 ++ > include/linux/btf.h | 2 + > kernel/bpf/arraymap.c | 14 ++++++- > kernel/bpf/btf.c | 86 ++++++++++++++++++++++++++++++++++++++++++- > kernel/bpf/hashtab.c | 29 ++++++++++----- > kernel/bpf/syscall.c | 57 +++++++++++++++++++++++++--- > 6 files changed, 173 insertions(+), 19 deletions(-) > [...] > + /* This call also serves as a whitelist of allowed objects that > + * can be used as a referenced pointer and be stored in a map at > + * the same time. > + */ > + dtor_btf_id = btf_find_dtor_kfunc(off_btf, id); > + if (dtor_btf_id < 0) { > + ret = dtor_btf_id; > + btf_put(off_btf); do btf_put() in end section instead of copy/pasting it in every single branch here and below? > + goto end; > + } > + > + dtor_func = btf_type_by_id(off_btf, dtor_btf_id); > + if (!dtor_func || !btf_type_is_func(dtor_func)) { > + ret = -EINVAL; > + btf_put(off_btf); > + goto end; > + } > + [...] > - while (tab->nr_off--) > + while (tab->nr_off--) { > btf_put(tab->off[tab->nr_off].btf); > + if (tab->off[tab->nr_off].module) > + module_put(tab->off[tab->nr_off].module); > + } > kfree(tab); > return ERR_PTR(ret); > } > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > index 65877967f414..fa4a0a8754c5 100644 > --- a/kernel/bpf/hashtab.c > +++ b/kernel/bpf/hashtab.c > @@ -725,12 +725,16 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map, > return insn - insn_buf; > } > > -static void check_and_free_timer(struct bpf_htab *htab, struct htab_elem *elem) > +static void check_and_free_timer_and_kptr(struct bpf_htab *htab, we'll need to rename this to check_and_free_timer_and_kptrs_and_dynptrs() pretty soon, so let's better figure out more generic name now? :) Don't know, something like "release_fields" or something? > + struct htab_elem *elem, > + bool free_kptr) > { > + void *map_value = elem->key + round_up(htab->map.key_size, 8); > + > if (unlikely(map_value_has_timer(&htab->map))) > - bpf_timer_cancel_and_free(elem->key + > - round_up(htab->map.key_size, 8) + > - htab->map.timer_off); > + bpf_timer_cancel_and_free(map_value + htab->map.timer_off); > + if (unlikely(map_value_has_kptr(&htab->map)) && free_kptr) > + bpf_map_free_kptr(&htab->map, map_value); kptrs (please use plural consistently for functions that actually handle multiple kptrs). > } > > /* It is called from the bpf_lru_list when the LRU needs to delete [...] > static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem) > { > - check_and_free_timer(htab, elem); > + check_and_free_timer_and_kptr(htab, elem, true); > bpf_lru_push_free(&htab->lru, &elem->lru_node); > } > > @@ -1420,7 +1424,10 @@ static void htab_free_malloced_timers(struct bpf_htab *htab) > struct htab_elem *l; > > hlist_nulls_for_each_entry(l, n, head, hash_node) > - check_and_free_timer(htab, l); > + /* We don't reset or free kptr on uref dropping to zero, > + * hence set free_kptr to false. > + */ > + check_and_free_timer_and_kptr(htab, l, false); ok, now reading this, I wonder if it's better to keep timer and kptrs clean ups separate? And then dynptrs separate still? Instead of adding all these flags. > cond_resched_rcu(); > } > rcu_read_unlock(); > @@ -1430,6 +1437,7 @@ static void htab_map_free_timers(struct bpf_map *map) > { > struct bpf_htab *htab = container_of(map, struct bpf_htab, map); > > + /* We don't reset or free kptr on uref dropping to zero. */ > if (likely(!map_value_has_timer(&htab->map))) > return; > if (!htab_is_prealloc(htab)) [...]