Daniel Borkmann wrote: > Pedro Pinto and later independently also Hyunwoo Kim and Wongi Lee reported > an issue that the tcx_entry can be released too early leading to a use > after free (UAF) when an active old-style ingress or clsact qdisc with a > shared tc block is later replaced by another ingress or clsact instance. > > Essentially, the sequence to trigger the UAF (one example) can be as follows: > > 1. A network namespace is created > 2. An ingress qdisc is created. This allocates a tcx_entry, and > &tcx_entry->miniq is stored in the qdisc's miniqp->p_miniq. At the > same time, a tcf block with index 1 is created. > 3. chain0 is attached to the tcf block. chain0 must be connected to > the block linked to the ingress qdisc to later reach the function > tcf_chain0_head_change_cb_del() which triggers the UAF. > 4. Create and graft a clsact qdisc. This causes the ingress qdisc > created in step 1 to be removed, thus freeing the previously linked > tcx_entry: > > rtnetlink_rcv_msg() > => tc_modify_qdisc() > => qdisc_create() > => clsact_init() [a] > => qdisc_graft() > => qdisc_destroy() > => __qdisc_destroy() > => ingress_destroy() [b] > => tcx_entry_free() > => kfree_rcu() // tcx_entry freed > > 5. Finally, the network namespace is closed. This registers the > cleanup_net worker, and during the process of releasing the > remaining clsact qdisc, it accesses the tcx_entry that was > already freed in step 4, causing the UAF to occur: > > cleanup_net() > => ops_exit_list() > => default_device_exit_batch() > => unregister_netdevice_many() > => unregister_netdevice_many_notify() > => dev_shutdown() > => qdisc_put() > => clsact_destroy() [c] > => tcf_block_put_ext() > => tcf_chain0_head_change_cb_del() > => tcf_chain_head_change_item() > => clsact_chain_head_change() > => mini_qdisc_pair_swap() // UAF > > There are also other variants, the gist is to add an ingress (or clsact) > qdisc with a specific shared block, then to replace that qdisc, waiting > for the tcx_entry kfree_rcu() to be executed and subsequently accessing > the current active qdisc's miniq one way or another. > > The correct fix is to turn the miniq_active boolean into a counter. What > can be observed, at step 2 above, the counter transitions from 0->1, at > step [a] from 1->2 (in order for the miniq object to remain active during > the replacement), then in [b] from 2->1 and finally [c] 1->0 with the > eventual release. The reference counter in general ranges from [0,2] and > it does not need to be atomic since all access to the counter is protected > by the rtnl mutex. With this in place, there is no longer a UAF happening > and the tcx_entry is freed at the correct time. > > Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") > Reported-by: Pedro Pinto <xten@xxxxxxx> > Co-developed-by: Pedro Pinto <xten@xxxxxxx> > Signed-off-by: Pedro Pinto <xten@xxxxxxx> > Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > Cc: Hyunwoo Kim <v4bel@xxxxxxxxx> > Cc: Wongi Lee <qwerty@xxxxxxxxx> > Cc: Martin KaFai Lau <martin.lau@xxxxxxxxxx> > --- Acked-by: John Fastabend <john.fastabend@xxxxxxxxx>