On Tue, Nov 29, 2022 at 12:50 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > Stanislav Fomichev <sdf@xxxxxxxxxx> writes: > > > Please see the first patch in the series for the overall > > design and use-cases. > > > > Changes since v2: > > > > - Rework bpf_prog_aux->xdp_netdev refcnt (Martin) > > > > Switched to dropping the count early, after loading / verification is > > done. At attach time, the pointer value is used only for comparing > > the actual netdev at attach vs netdev at load. > > So if we're not holding the netdev reference, we'll end up with a BPF > program with hard-coded CALL instructions calling into a module that > could potentially be unloaded while that BPF program is still alive, > right? > > I suppose that since we're checking that the attach iface is the same > that the program should not be able to run after the module is unloaded, > but it still seems a bit iffy. And we should definitely block > BPF_PROG_RUN invocations of programs with a netdev set (but we should do > that anyway). Ugh, good point about BPF_PROG_RUN, seems like it should be blocked regardless of the locking scheme though, right? Since our mlx4/mlx5 changes expect something after the xdp_buff, we can't use those per-netdev programs with our generic bpf_prog_test_run_xdp... > > (potentially can be a problem if the same slub slot is reused > > for another netdev later on?) > > Yeah, this would be bad as well, obviously. I guess this could happen? Not sure, that's why I'm raising it here to see what others think :-) Seems like this has to be actively exploited to happen? (and it's a privileged operation) Alternatively, we can go back to the original version where the prog holds the device. Matin mentioned in the previous version that if we were to hold a netdev refcnt, we'd have to drop it also from unregister_netdevice. It feels like beyond that extra dev_put, we'd need to reset our aux->xdp_netdev and/or add some flag or something else to indicate that this bpf program is "orphaned" and can't be attached anywhere anymore (since the device is gone; netdev_run_todo should free the netdev it seems). That should address this potential issue with reusing the same addr for another netdev, but is a bit more complicated code-wise. Thoughts? > -Toke >