On 02/20/2018 11:44 AM, Pablo Neira Ayuso wrote: > Hi David! > > On Mon, Feb 19, 2018 at 12:22:26PM -0500, David Miller wrote: > [...] >> Netfilter's chronic performance differential is why a lot of mindshare >> was lost to userspace networking technologies. > > Claiming that Netfilter is the reason for the massive adoption of > userspace networking isn't a fair statement at all. > > Let's talk about performance if this is what you want: > > * Our benchmarks here are delivering ~x9.5 performance boost for IPv4 > load balancing from netfilter ingress. > > * ~x2 faster than iptables prerouting when dropping packets at very > early stage in the network datapath - dos attack scenario - again from > the ingress hook. > > * The new flowtable infrastructure that will show up in 4.16 provides > a faster forwarding path, measuring ~x2 faster forwarding here, _by > simply adding one single rule to your FORWARD chain_. And that's > just the initial implementation that got merged upstream, we have > room to fly even faster. > > And that's just the beginning, we have more ongoing work, incrementally > based on top of what we have, to provide even faster datapath paths with > very simple configurations. > > Note that those numbers above are very similar numbers to what we have > seen in bpf. Well, to be honest, we're just slightly behind bpf, since > benchmarks I have seen on loading balancing IPv4 is x10 from XDP, > dropping packets also slightly more than x2, which is actually happening > way earlier than ingress, naturally dropping earlier gives us better > numbers. > > But it's not all about performance... let's have a look at the "iron > triangle"... > > We keep usability in our radar, that's paramount for us. Netfilter is > probably so much widely more adopted than tc because of this. The kind Right, in terms of performance the above is what tc ingress used to do already long ago after spinlock removal could be lifted, which was an important step on that direction. In terms of usability, sure, it's always a 'fun' topic on that matter for a number of classifier / actions mostly from the older days. I think there it has improved a bit over time, but at least speaking of things like cls_bpf, it's trivial to attach an object somewhere via tc cmdline. > of problems that big Silicon datacenters have to face are simply > different to the millions of devices running Linux outthere, there are > plenty of smart devops outthere that sacrify the little performance loss > at the cost of keeping it easy to configure and maintain things. > > If we want to talk about problems... > > Every project has its own subset of problems. In that sense, anyone that > has spent time playing with the bpf infrastructure is very much aware of > all of its usability problems: > > * You have to disable optimizations in llvm, otherwise the verifier > gets confused too smart compiler optimizations and rejects the code. That is actually a false claim, which makes me think that you didn't even give this a try at all before stating the above. Funny enough, for a very long period of time in LLVM's BPF back end when you used other optimization levels than the -O2, clang would bark with an internal error, for example: $ clang-3.9 -target bpf -O0 -c foo.c -o /tmp/foo.o fatal error: error in backend: Cannot select: 0x5633ae698280: ch,glue = BPFISD::CALL 0x5633ae698210, 0x5633ae697e90, Register:i64 %R1, Register:i64 %R2, Register:i64 %R3, 0x5633ae698210:1 0x5633ae697e90: i64,ch = load<LD8[@tail_call]> 0x5633ae6955e0, 0x5633ae694fc0, undef:i64 0x5633ae694fc0: i64 = BPFISD::Wrapper TargetGlobalAddress:i64<void (%struct.__sk_buff*, i8*, i32)** @tail_call> 0 [...] Whereas -O2 *is* the general recommendation for everyone to use: $ clang-3.9 -target bpf -O2 -c foo.c -o /tmp/foo.o $ This is fixed in later versions, e.g. in clang-7.0 such back end error is gone anyway fwiw. But in any case, we're running complex programs with -O2 optimization levels for several years now just fine. Yes, given we do push BPF to the limits we had some corner cases where the verifier had to be adjusted, but overall the number of cases reduced over time, which is also a natural progression when people use it in various advanced ways. In fact, it's a much better choice to use clang with -O2 here since simply the majority of people use it that way. And if you consume it via higher level front ends e.g. bcc, ply, bpftrace to name a few from tracing side, then you don't need to care at all about this. (But in addition to that, there's also continuous effort on LLVM side to optimize BPF code generation in various ways.) > * Very hard to debug the reason why the verifier is rejecting apparently > valid code. That results in people playing strange "moving code around > up and down". Please show me your programs and I'm happy to help you out. :-) Yes, in the earlier days, I would consider it might have been hard; during the course of the last few years, the verifier and LLVM back end have seen both heavy improvements all over the place e.g. llvm-objdump correlating verifier errors back to the pseudo C code via dwarf was a bigger one on the latter for example. Writing BPF programs definitely became easier although there's always undoubted room for improvement and the work we're heading towards will make it more natural to develop programs in the C front end it provides, reducing further potential contention with the verifier. It takes a bit to get used to the verifier analysis, but then there's always a learning curve for getting into new frameworks and develop a basic understanding for their semantics. Same holds true when people would switch from using their known ip*tables-translate syntax to using nft directly. Anyway, aside from this, for BPF we also have the case that the people who develop programs to solve problems with the help of this technology are just a small subset of the ones that are using it. Best example is probably, as mentioned some time ago in the thread, the hard work from Brendan Gregg and many others in bcc to develop all the really easy to consume tracing cmdline tools. > * Lack of sufficient abstraction: bpf is not only exposing its own > software bugs through its interface, but it will also bite the dust > with CPU bugs due to lack of glue code to hide details behind the > syscall interface curtain. That will need a kernel upgrade after all to > fix, so all benefits of adding new programs. We've even seem claims on > performance being more important than security in this mailing list. > Don't get me wrong, no software is safe from security issues, but if you > don't abstract your resources in the right way, you have more chance to > have experimence more problems. Sorry, but this is just nebulous FUD here. Yes, every software has bugs. If there are bugs, we handle them and fix it, period. So? You've probably seen the extensive kernel selftest suite we have developed over time that by now contains more than over 1k test cases on the BPF core infrastructure and many more to come. Quite frankly, I'm actually very happy on the progress from the syzkaller folks in recent months as well to stress BPF continuously, and it finds bugs just as well in other areas (like netfilter), so yeah, we all do keep our heads down and fix them properly in order to make everything more robust. > Just to mention a few of them. > > So, please, let's focus each of us in our own work. Let me remind your > wise words - I think just one year ago in another of these episodes of > the bpf vs. netfilter: "We're all working to achieve the same goals", > even if we're working on competing projects inside Linux. > > Thanks! > -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html