On Tue, 25 Apr 2017 20:07:34 -0700 John Fastabend <john.fastabend@xxxxxxxxx> wrote: > On 17-04-25 05:26 PM, Alexei Starovoitov wrote: > > On Tue, Apr 25, 2017 at 11:34:53AM +0200, Jesper Dangaard Brouer wrote: > >>> Note the very first bpf patchset years ago contained the port table > >>> abstraction. ovs has concept of vports as well. These two very > >>> different projects needed port table to provide a layer of > >>> indirection between ifindex==netdev and virtual port number. > >>> This is still the case and I'd like to see this port table to be > >>> implemented for both cls_bpf and xdp. In that sense xdp is not > >>> special. > >> > >> Glad to hear you want to see this implemented, I will start coding on > >> this then. Good point with cls_bpf, I was planning to make this port > >> table strongly connected to XDP, guess I should also think of cls_bpf. > > > > perfect. > > I think we should try to make all additions to bpf networking world > > to be usable for both tc and xdp, since both are actively used and > > it wouldn't be great to have cool feature for one, but not the other. > > I think port table is an excellent candidate that applies to both. > > +1 > > Jesper, I was working up the code for the redirect piece for ixgbe and > virtio, please use this as a base for your virtual port number table. I'll > push an update onto github tomorrow. I think the table should drop in fairly > nicely. Cool, I will do that. Then, I'll also have a redirect method to shape this around, and I would have to benchmark/test your ixgbe redirect. (John please let me know, what github tree we are talking about, and what branch) > One piece that isn't clear to me is how do you plan to instantiate and > program this table. Is it a new static bpf map that is created any > time we see the redirect command? I think this would be preferred. (This is difficult to explain without us misunderstanding each-other) As Alexei also mentioned before, ifindex vs port makes no real difference seen from the bpf program side. It is userspace's responsibility to add ifindex/port's to the bpf-maps, according to how the bpf program "policy" want to "connect" these ports. The port-table system add one extra step, of also adding this port to the port-table (which lives inside the kernel). When loading the XDP program, we also need to pass along a port table "id" this XDP program is associated with (and if it doesn't exists you create it). And your userspace "control-plane" application also need to know this port table "id", when adding a new port. The concept of having multiple port tables is key. As this implies we can have several simultaneous "data-planes" that is *isolated* from each-other. Think about how network-namespaces/containers want isolation. A subtle thing I'm afraid to mention, is that oppose to the ifindex model, a port table with mapping to a net_device pointer, would allow (faster) delivery into the container's inner net_device, which sort of violates the isolation, but I would argue it is not a problem as this net_device pointer could only be added from a process within the namespace. I like this feature, but it could easily be disallowed via port insertion-time validation. > >> I'm not worried about the DROP case, I agree that is fine (as you > >> also say). The problem is unintentionally sending a packet to a > >> wrong ifindex. This is clearly an eBPF program error, BUT with > >> XDP this becomes a very hard to debug program error. With > >> TC-redirect/cls_bpf we can tcpdump the packets, with XDP there is > >> no visibility into this happening (the NSA is going to love this > >> "feature"). Maybe we could add yet-another tracepoint to allow > >> debugging this. My proposal that we simply remove the possibility > >> for such program errors, by as you say move the validation from > >> run-time into static insertion-time, via a port table. > > > > I think lack of tcpdump-like debugging in xdp is a separate issue. > > As I was saying in the other thread we have trivial 'xdpdump' > > kern+user app that emits pcap file, but it's too specific to how we > > use tail_calls+prog_array in our xdp setup. I'm working on the > > program chaining that will be generic and allow us transparently > > add multiple xdp or tc progs to the same attachment point and will > > allow us to do 'xdpdump' at any point of this pipeline, so > > debugging of what happened to the packet will be easier and done in > > the same way for both tc and xdp. > > btw in our experience working with both tc and xdp the tc+bpf was > > actually harder to use and more bug prone. > > > > Nice, the tcpdump-like debugging looks interesting. Yes, this xdpdump sound like a very useful tool. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer