On Wed, Dec 27, 2023 at 12:20 AM D. Wythe <alibuda@xxxxxxxxxxxxxxxxx> wrote: > > > Hi Alexei, > > > IMMO, nf_unregister_net_hook does not wait for the completion of the > execution of the hook that is being removed, > instead, it allocates a new array without the very hook to replace the > old arrayvia rcu_assign_pointer() (in __nf_hook_entries_try_shrink), > then it use call_rcu() to release the old one. > > You can find more details in commit > 8c873e2199700c2de7dbd5eedb9d90d5f109462b. > > In other words, when nf_unregister_net_hook returns, there may still be > contexts executing hooks on the > old array, which means that the `link` may still be accessed after > nf_unregister_net_hook returns. > > And that's the reason why we use kfree_rcu() to release the `link`. > >> nf_hook_run_bpf > >> const struct > >> bpf_nf_link *nf_link = bpf_link; > >> > >> bpf_nf_link_release > >> nf_unregister_net_hook(nf_link->net, &nf_link->hook_ops); > >> > >> bpf_nf_link_dealloc > >> free(link) > >> bpf_prog_run(link->prog); Got it. Sounds like it's an existing bug. If so it should be an independent patch with Fixes tag. Also please craft a test case to demonstrate UAF. > > I must admit that it is indeed feasible if we eliminate the mutex and > use cmpxchg to swap the prog (we need to ensure that there is only one > bpf_prog_put() on the old prog). > However, when cmpxchg fails, it means that this context has not > outcompeted the other one, and we have to return a failure. Maybe > something like this: > > if (!cmpxchg(&link->prog, old_prog, new_prog)) { > /* already replaced by another link_update */ > return -xxx; > } > > As a comparison, The version with the mutex wouldn't encounter this > error, every update would succeed. I think that it's too harsh for the > user to receive a failure > in that case since they haven't done anything wrong. Disagree. The mutex doesn't prevent this issue. There is always a race. It happens when link_update.old_prog_fd and BPF_F_REPLACE were specified. One user space passes an FD of the old prog and another user space doing the same. They both race and one of them gets if (old_prog && link->prog != old_prog) { err = -EPERM; it's no different with dropping the mutex and doing: if (old_prog) { if (!cmpxchg(&link->prog, old_prog, new_prog)) -EPERM } else { old_prog = xchg(&link->prog, new_prog); }