On Wed, Mar 12, 2014 at 08:29:07PM -0700, Alexei Starovoitov wrote: > On Wed, Mar 12, 2014 at 2:15 AM, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: [...] > The patches don't explain the reasons to do nft socket filtering. OK, some reasons from the interface point of view: 1) It provides an extensible interface to userspace. We didn't invented a new wheel in that regard, we just reused the extensibility of TLVs used in netlink as intermediate format between user and kernelspace, also used many other applications outthere. The TLV parsing and building is not new code, most that of that code has been exposed to userspace already through netlink. 2) It shows that, with little generalisation, we can open the door to one single *classification interface* for the users. Just make some little googling, you'll find *lots of people* barfing on the fact that we have that many interfaces to classify packets in Linux. And I'm *not* talking about the packet classification approach itself, that's a different debate of course. [...] > Simplicity and performance should be the deciding factor. > imo nft+sock_filter example is not simple. OK, some comments in that regard: 1) Simplicity: With the nft approach you can just use a filter expressed in json, eg. {"rule": {"expr":[ {"type":"meta","dreg":1,"key":"protocol"}, {"type":"cmp","sreg":1,"op":"eq","cmpdata":{"data_reg":{"type":"value","len":2,"data0":"0x00000008"}}}, {"type":"payload","dreg":1,"offset":9,"len":1,"base":"network"}, {"type":"cmp","sreg":1,"op":"eq","cmpdata":{"data_reg":{"type":"value","len":1,"data0":"0x00000006"}}}, {"type":"immediate","dreg":0,"immediatedata":{"data_reg":{"type":"verdict","verdict":"0xffffffff"}}}] } } Which is still more readable and easier to maintain that a BPF snippet. So you just pass it to the json parser in the libnftnl library (or whatever other better minimalistic library we would ever have) which transforms this to TLV format that you can pass to the kernel. The kernel will do the job to translate this. How are users going to integrate the restricted C's eBPF code infrastructure into their projects? I don't think that will be simple. They will likely include the BPF snippet to avoid all the integration burden, as it already happens in many projects with BPF filters. 2) Performance. Patrick has been doing many progress with the generic set infrastructure for nft. In that regard, we aim to achieve performance by arranging data using performance data structures, *jit is not the only way to boost performance* (although we also want to have it). Some example: set type IPv4 address = { N thousands of IPv4 addresses } reg1 <- payload(network header, offsetof(struct iphdr, daddr), 4) lookup(reg1, set) reg0 <- immediate(0xffffffff) Or even better, using dictionaries: set type IPv4 address = { 1.1.1.1 : accept, 2.2.2.2 : accept, 3.3.3.3 : drop ...} reg1 <- payload(network header, offsetof(struct iphdr, daddr), 4) reg0 <- lookup(reg1, set) Where "accept" is an alias of 0xffffffff and "drop" is 0 in the nft-sock case. The verdict part has been generalised so we can adapt nft to the corresponding packet classification engine. This part just needs a follow-up patch to add set-based filtering for nft-sockets. See this for more information about ongoing efforts on the nft set infrastructure: http://patchwork.ozlabs.org/patch/326368/ that will also integrate into nft-sock. [...] > > +struct sock_filter_ext { > > + __u8 code; /* opcode */ > > + __u8 a_reg:4; /* dest register */ > > + __u8 x_reg:4; /* source register */ > > + __s16 off; /* signed offset */ > > + __s32 imm; /* signed immediate constant */ > > +}; > > > > If I didn't miss anything from your patchset, that structure is again > > exposed to userspace, so we won't be able to change it ever unless we > > add a new binary interface. > > > > From the user interface perspective, our nft approach is quite > > different, from userspace you express the filter using TLVs (like in > > the payload of the netlink message). Then, nf_tables core > > infrastructure parses this and transforms it to the kernel > > representation. It's very flexible since you don't expose the internal > > representation to userspace, which means that we can change the > > internal layout anytime, thus, highly extensible. > > bpf/ebpf is an instruction set. Arguing about fixed vs non-fixed > insn size is like arguing x86 variable encoding vs sparc fixed. > Both are infinitely flexible. Right, you can extend interfaces forever with lots of patchwork and "smart tricks" but that doesn't mean that will look nice... As I said, I believe that having a nice extensible interface is extremely important to make it easier for development. If we have to rearrange the internal representation for some reason, we can do it indeed without bothering about making translations to avoid breaking userspace and having to use ugly tricks (just see sk_decode_filter() or any other translation to support any new way to express a filter...). [...] > Say you want to translate nft-cmp instruction into sequence of native > comparisons. You'd need to use load from memory, compare and > branch operations. That's ebpf! Nope sorry, that's not ebpf. That's assembler code. [...] > You can view ebpf as a tool to achieve jiting of nft. > It will save you a lot of time. nft interface is already well-abstracted from the representation, so I don't find a good reason to make a step backward that will force us to represent the instructions using a fixed layout structure that is exposed to userspace, that we won't be able to change once if this gets into mainstream. Probably the nft is not the easiest path, I agree, it's been not so far if you look at the record. But with time and development hours from everyone, I believe we'll enjoy a nice framework. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html