Stanislav Fomichev <sdf@xxxxxxxxxx> writes: > On Tue, Nov 29, 2022 at 12:50 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >> >> Stanislav Fomichev <sdf@xxxxxxxxxx> writes: >> >> > Please see the first patch in the series for the overall >> > design and use-cases. >> > >> > Changes since v2: >> > >> > - Rework bpf_prog_aux->xdp_netdev refcnt (Martin) >> > >> > Switched to dropping the count early, after loading / verification is >> > done. At attach time, the pointer value is used only for comparing >> > the actual netdev at attach vs netdev at load. >> >> So if we're not holding the netdev reference, we'll end up with a BPF >> program with hard-coded CALL instructions calling into a module that >> could potentially be unloaded while that BPF program is still alive, >> right? >> >> I suppose that since we're checking that the attach iface is the same >> that the program should not be able to run after the module is unloaded, >> but it still seems a bit iffy. And we should definitely block >> BPF_PROG_RUN invocations of programs with a netdev set (but we should do >> that anyway). > > Ugh, good point about BPF_PROG_RUN, seems like it should be blocked > regardless of the locking scheme though, right? > Since our mlx4/mlx5 changes expect something after the xdp_buff, we > can't use those per-netdev programs with our generic > bpf_prog_test_run_xdp... Yup, I think we should just block it for now; maybe it can be enabled later if it turns out to be useful (and we find a way to resolve the kfuncs for this case). Also, speaking of things we need to disable, tail calls is another one. And for freplace program attachment we need to add a check that the target interfaces match as well. >> > (potentially can be a problem if the same slub slot is reused >> > for another netdev later on?) >> >> Yeah, this would be bad as well, obviously. I guess this could happen? > > Not sure, that's why I'm raising it here to see what others think :-) > Seems like this has to be actively exploited to happen? (and it's a > privileged operation) > > Alternatively, we can go back to the original version where the prog > holds the device. > Matin mentioned in the previous version that if we were to hold a > netdev refcnt, we'd have to drop it also from unregister_netdevice. Yeah; I guess we could keep a list of "bound" XDP programs in struct net_device and clear each one on unregister? Also, bear in mind that the "unregister" callback is also called when a netdev moves between namespaces; which is probably not what we want in this case? > It feels like beyond that extra dev_put, we'd need to reset our > aux->xdp_netdev and/or add some flag or something else to indicate > that this bpf program is "orphaned" and can't be attached anywhere > anymore (since the device is gone; netdev_run_todo should free the > netdev it seems). You could add a flag, and change the check to: + if (new_prog->aux->xdp_has_netdev && + new_prog->aux->xdp_netdev != dev) { + NL_SET_ERR_MSG(extack, "Cannot attach to a different target device"); + return -EINVAL; + } That way the check will always fail if xdp_netdev is reset to NULL (while keeping the flag) on dereg? > That should address this potential issue with reusing the same addr > for another netdev, but is a bit more complicated code-wise. > Thoughts? I'd be in favour of adding this tracking; I worry that we'll end up with some very subtle and hard-to-debug bugs if we somehow do end up executing the wrong kfuncs... -Toke