On Thu, Sep 28, 2023 at 11:17 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > Daniel Borkmann <daniel@xxxxxxxxxxxxx> writes: > > > This work adds a new, minimal BPF-programmable device called "meta" we > > recently presented at LSF/MM/BPF. The latter name derives from the Greek > > μετά, encompassing a wide array of meanings such as "on top of", "beyond". > > Given business logic is defined by BPF, this device can have many meanings. > > The core idea is that BPF programs are executed within the drivers xmit > > routine and therefore e.g. in case of containers/Pods moving BPF processing > > closer to the source. > > I like the concept, but I think we should change the name (as I believe > I also mentioned back when you presented it at LSF/MM/BPF). I know this > is basically bikeshedding, but I nevertheless think it is important, for > a couple of reasons: > > - As you say, meta has a specific meaning, and this device is not a > "meta" device in the common sense of the word: it is not tied to other > devices (so it's not 'on top of' anything), and it is not "about" > anything (as in metadata). It is just a device type that is programmed > by BPF, so let's call it that. > > - It's not discoverable; how are people supposed to figure out that they > should go look for a 'meta' device? We also already have multiple > things called 'metadata', so this is just going to create even more > confusion (as we also discussed in relation to 'xdp hints'). > > - It squats on a pretty widely used term throughout the kernel > (CONFIG_META, 'meta' as the module name). This is related to the above > point; seeing something named 'meta' in lsmod, the natural assumption > wouldn't be that it's a network driver. > > I think we should just name the driver 'bpfnet'; it's not pretty, but > it's obvious and descriptive. Optionally we could teach 'ip' to > understand just 'bpf' as the device type, so you could go 'ip link add > type bpf' and get one of these. +1 > > One of the goals was that in case of Pod egress traffic, this allows to > > move BPF programs from hostns tcx ingress into the device itself, providing > > earlier drop or forward mechanisms, for example, if the BPF program > > determines that the skb must be sent out of the node, then a redirect to > > the physical device can take place directly without going through per-CPU > > backlog queue. This helps to shift processing for such traffic from softirq > > to process context, leading to better scheduling decisions and better > > performance. > > So my only reservation to having this tied to a BPF-only device like > this is basically that if this is indeed such a big win, shouldn't we > try to make the stack operate in this mode by default? I assume you did > the analysis of what it would take to change veth to operate in this > mode; so what was the reason you decided to create a new device type > instead? > > (I seem to recall at the presentation that you made a general reference > to veth being 'too complex', but complexity can be managed, so I'm more > thinking about whether there's any specific reason why changing veth > wouldn't work at all?) If one point is queuing packets on the softnet queue, I think it should be fine to call netif_receive_skb instead of netif_rx, at least for single device depth.