On 4/21/23 6:17 PM, Kumar Kartikeya Dwivedi wrote: > On Sat, Apr 15, 2023 at 10:18:11PM CEST, Dave Marchevsky wrote: >> Test refcounted local kptr functionality added in previous patches in >> the series. >> >> Usecases which pass verification: >> >> * Add refcounted local kptr to both tree and list. Then, read and - >> possibly, depending on test variant - delete from tree, then list. >> * Also test doing read-and-maybe-delete in opposite order >> * Stash a refcounted local kptr in a map_value, then add it to a >> rbtree. Read from both, possibly deleting after tree read. >> * Add refcounted local kptr to both tree and list. Then, try reading and >> deleting twice from one of the collections. >> * bpf_refcount_acquire of just-added non-owning ref should work, as >> should bpf_refcount_acquire of owning ref just out of bpf_obj_new >> >> Usecases which fail verification: >> >> * The simple successful bpf_refcount_acquire cases from above should >> both fail to verify if the newly-acquired owning ref is not dropped >> >> Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx> >> --- >> [...] >> +SEC("?tc") >> +__failure __msg("Unreleased reference id=3 alloc_insn=21") >> +long rbtree_refcounted_node_ref_escapes(void *ctx) >> +{ >> + struct node_acquire *n, *m; >> + >> + n = bpf_obj_new(typeof(*n)); >> + if (!n) >> + return 1; >> + >> + bpf_spin_lock(&glock); >> + bpf_rbtree_add(&groot, &n->node, less); >> + /* m becomes an owning ref but is never drop'd or added to a tree */ >> + m = bpf_refcount_acquire(n); > > I am analyzing the set (and I'll reply in detail to the cover letter), but this > stood out. > > Isn't this going to be problematic if n has refcount == 1 and is dropped > internally by bpf_rbtree_add? Are we sure this can never occur? It took me some > time, but the following schedule seems problematic. > > CPU 0 CPU 1 > n = bpf_obj_new > lock(lock1) > bpf_rbtree_add(rbtree1, n) > m = bpf_rbtree_acquire(n) > unlock(lock1) > > kptr_xchg(map, m) // move to map > // at this point, refcount = 2 > m = kptr_xchg(map, NULL) > lock(lock2) > lock(lock1) bpf_rbtree_add(rbtree2, m) > p = bpf_rbtree_first(rbtree1) if (!RB_EMPTY_NODE) bpf_obj_drop_impl(m) // A > bpf_rbtree_remove(rbtree1, p) > unlock(lock1) > bpf_obj_drop(p) // B > bpf_refcount_acquire(m) // use-after-free > ... > > B will decrement refcount from 1 to 0, after which bpf_refcount_acquire is > basically performing a use-after-free (when fortunate, one will get a > WARN_ON_ONCE splat for 0 to 1, otherwise, a silent refcount raise for some > different object). Thanks for the detailed feedback here and in the other thread in the series. I will address the issues you raised ASAP, starting with this one, which I've confirmed via a repro selftest. Will be sending fixes soon.