On 10/28/2016 01:53 PM, Pablo Neira Ayuso wrote: > On Thu, Oct 27, 2016 at 10:40:14AM +0200, Daniel Mack wrote: >> It's not anything new. These hooks live on the very same level as >> SO_ATTACH_FILTER. The only differences are that the BPF programs are >> stored in the cgroup, and not in the socket, and that they exist for >> egress as well. > > Can we agree this is going further than SO_ATTACH_FILTER? It's the same level. Only the way of setting the program(s) is different. >> Adding it there would mean we need to early-demux *every* packet as soon >> as there is *any* such rule installed, and that renders many >> optimizations in the kernel to drop traffic that has no local receiver >> useless. > > I think such concern applies to doing early demux inconditionally in > all possible scenarios (such as UDP broadcast/multicast), that implies > wasted cycles for people not requiring this. If you have a rule that acts on a condition based on a local receiver detail such as a cgroup membership, then the INPUT filter *must* know the local receiver for *all* packets passing by, otherwise it cannot act upon it. And that means that you have to early-demux in any case as long as at least one such a rule exists. > If we can do what demuxing in an optional way, ie. only when socket > filtering is required, then only those that need it would pay that > price. Actually, if we can do this demux very early, from ingress, > performance numbers would be also good to perform any socket-based > filtering. For multicast, rules have to be executed for each receiver, which is another reason why the INPUT path is the wrong place to solve to problem. You actually convinced me yourself about these details, but you seem to constantly change your opinion about all this. Why is this such a whack-a-mole game? > I guess you're using an old kernel and refering to iptables, this is > not true for some time, so we don't have any impact now with loaded > iptables modules. My point is that the performance decrease introduced by my patch set is not really measurable, even if you pipe all the wire-saturating test traffic through the example program. At least not with my setup here. If a local receiver has no applicable bpf in its cgroup, the logic bails out way earlier, leading a lot less overhead even. And if no cgroup has any program attached, the code is basically no-op thanks to the static branch. I really see no reason to block this patch set due to unfounded claims of bad performance. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html