On Wed, Mar 23, 2022 at 02:21:56AM IST, Andrii Nakryiko wrote: > On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi > <memxor@xxxxxxxxx> wrote: > > > > A destructor kfunc can be defined as void func(type *), where type may > > be void or any other pointer type as per convenience. > > > > In this patch, we ensure that the type is sane and capture the function > > pointer into off_desc of ptr_off_tab for the specific pointer offset, > > with the invariant that the dtor pointer is always set when 'kptr_ref' > > tag is applied to the pointer's pointee type, which is indicated by the > > flag BPF_MAP_VALUE_OFF_F_REF. > > > > Note that only BTF IDs whose destructor kfunc is registered, thus become > > the allowed BTF IDs for embedding as referenced kptr. Hence it serves > > the purpose of finding dtor kfunc BTF ID, as well acting as a check > > against the whitelist of allowed BTF IDs for this purpose. > > > > Finally, wire up the actual freeing of the referenced pointer if any at > > all available offsets, so that no references are leaked after the BPF > > map goes away and the BPF program previously moved the ownership a > > referenced pointer into it. > > > > The behavior is similar to BPF timers, where bpf_map_{update,delete}_elem > > will free any existing referenced kptr. The same case is with LRU map's > > bpf_lru_push_free/htab_lru_push_free functions, which are extended to > > reset unreferenced and free referenced kptr. > > > > Note that unlike BPF timers, kptr is not reset or freed when map uref > > drops to zero. > > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> > > --- > > include/linux/bpf.h | 4 ++ > > include/linux/btf.h | 2 + > > kernel/bpf/arraymap.c | 14 ++++++- > > kernel/bpf/btf.c | 86 ++++++++++++++++++++++++++++++++++++++++++- > > kernel/bpf/hashtab.c | 29 ++++++++++----- > > kernel/bpf/syscall.c | 57 +++++++++++++++++++++++++--- > > 6 files changed, 173 insertions(+), 19 deletions(-) > > > > [...] > > > + /* This call also serves as a whitelist of allowed objects that > > + * can be used as a referenced pointer and be stored in a map at > > + * the same time. > > + */ > > + dtor_btf_id = btf_find_dtor_kfunc(off_btf, id); > > + if (dtor_btf_id < 0) { > > + ret = dtor_btf_id; > > + btf_put(off_btf); > > do btf_put() in end section instead of copy/pasting it in every single > branch here and below? > Ok. > > + goto end; > > + } > > + > > + dtor_func = btf_type_by_id(off_btf, dtor_btf_id); > > + if (!dtor_func || !btf_type_is_func(dtor_func)) { > > + ret = -EINVAL; > > + btf_put(off_btf); > > + goto end; > > + } > > + > > [...] > > > - while (tab->nr_off--) > > + while (tab->nr_off--) { > > btf_put(tab->off[tab->nr_off].btf); > > + if (tab->off[tab->nr_off].module) > > + module_put(tab->off[tab->nr_off].module); > > + } > > kfree(tab); > > return ERR_PTR(ret); > > } > > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > > index 65877967f414..fa4a0a8754c5 100644 > > --- a/kernel/bpf/hashtab.c > > +++ b/kernel/bpf/hashtab.c > > @@ -725,12 +725,16 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map, > > return insn - insn_buf; > > } > > > > -static void check_and_free_timer(struct bpf_htab *htab, struct htab_elem *elem) > > +static void check_and_free_timer_and_kptr(struct bpf_htab *htab, > > we'll need to rename this to > check_and_free_timer_and_kptrs_and_dynptrs() pretty soon, so let's > better figure out more generic name now? :) > > Don't know, something like "release_fields" or something? > Ok, will change. > > + struct htab_elem *elem, > > + bool free_kptr) > > { > > + void *map_value = elem->key + round_up(htab->map.key_size, 8); > > + > > if (unlikely(map_value_has_timer(&htab->map))) > > - bpf_timer_cancel_and_free(elem->key + > > - round_up(htab->map.key_size, 8) + > > - htab->map.timer_off); > > + bpf_timer_cancel_and_free(map_value + htab->map.timer_off); > > + if (unlikely(map_value_has_kptr(&htab->map)) && free_kptr) > > + bpf_map_free_kptr(&htab->map, map_value); > > kptrs (please use plural consistently for functions that actually > handle multiple kptrs). > Ok, will audit all other places as well. > > } > > > > /* It is called from the bpf_lru_list when the LRU needs to delete > > [...] > > > static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem) > > { > > - check_and_free_timer(htab, elem); > > + check_and_free_timer_and_kptr(htab, elem, true); > > bpf_lru_push_free(&htab->lru, &elem->lru_node); > > } > > > > @@ -1420,7 +1424,10 @@ static void htab_free_malloced_timers(struct bpf_htab *htab) > > struct htab_elem *l; > > > > hlist_nulls_for_each_entry(l, n, head, hash_node) > > - check_and_free_timer(htab, l); > > + /* We don't reset or free kptr on uref dropping to zero, > > + * hence set free_kptr to false. > > + */ > > + check_and_free_timer_and_kptr(htab, l, false); > > ok, now reading this, I wonder if it's better to keep timer and kptrs > clean ups separate? And then dynptrs separate still? Instead of adding > all these flags. Right, in case of array map, we directly called bpf_timer_cancel_and_free, instead of going through the function named like this, I guess it makes sense to do the same for hash map, since I assume both kptrs and dynptrs wouldn't want to be freed on map_release_uref, unlike timers. > > > cond_resched_rcu(); > > } > > rcu_read_unlock(); > > @@ -1430,6 +1437,7 @@ static void htab_map_free_timers(struct bpf_map *map) > > { > > struct bpf_htab *htab = container_of(map, struct bpf_htab, map); > > > > + /* We don't reset or free kptr on uref dropping to zero. */ > > if (likely(!map_value_has_timer(&htab->map))) > > return; > > if (!htab_is_prealloc(htab)) > > [...] -- Kartikeya