Re: [PATCH] Add tcindex to conntrack and add netfilter target/matches

Luuk Paulussen <Luuk.Paulussen@xxxxxxxxxxxxxxxxxxx> · Sun, 13 Dec 2015 23:00:15 +0000

On 12/09/2015 10:07 PM, Daniel Borkmann wrote:
> On 12/07/2015 03:19 AM, Luuk Paulussen wrote:
>> On 12/07/2015 11:45 AM, Florian Westphal wrote:
>>> Luuk Paulussen <Luuk.Paulussen@xxxxxxxxxxxxxxxxxxx> wrote:
>>>> Hi All,
>>>>
>>>> I'm still hoping for some feedback on this.  I have some userspace
>>>> patches around this as well, (to set/show the tc_index in the
>>>> connection, and to add the marking/matching rules in iptables), but 
>>>> I am
>>>> holding off on sending them until I know what people think of this
>>>> idea/implementation first.
>>> I can't say for sure since I don't know enough about tc.
>>>
>>> However, AFAICS tc_index seems to be something that should be internal
>>> to tc and not exposed/changeable via iptables.
>> tc_index is a mark that can be set by certain configurable ingress
>> schedulers (dsmark, GRED, ingress) for later classification via the
>> tcindex classifer.  This just adds an alternative mechanism for setting
>> this mark if those schedulers aren't being used.
>
> Fwiw, tc_index can be read/written by cls_bpf (and you can also apply 
> masks
> on that field if needed).
I've just been looking into this and it does seem like it might cover a 
small part of what we are trying to do, although misses the key part, 
which is to use connection tracking information to limit full processing 
to the first packet of a flow in each direction. I'm guessing that there 
isn't any bpf support for connection information?

One thing that isn't quite clear to me. Is it possible to use xt_bfp.c 
to set the tc_index field from netfilter?  If this is the case, then it 
does set a precedent
for being able to set this value outside of tc code (but sill misses the 
save/restore possibility).

Given that tc_index is simple metadata I'm guessing that filter 
performance over the tcindex classifier wouldn't be significantly better?

>> * dsmark sets the tc_index value based on the incoming DSCP value
>> * ingress sets the tc_index value based on other rules (e.g. mark set
>> via iptables)
>> * New code sets tc_index directly based on iptables classification or
>> restoring saved value.

I'm still looking for an overall idea around whether this patch has a 
chance of being accepted for the kernel.  It feels like none of the 
comments or proposed ideas have addressed the issues that the patch is 
addressing:
1. Save/restore functionality of mark/connmark can significantly 
increase performance for larger rule sets, so is desirable for 
performance reasons.
2. Insufficient space in skb nf mark and connection mark for all 
applications that might want to use it.
3. tc being one of the users of nf mark (via fw filter) has a logical 
alternative in the 16 bit tc_index field, which could be used without 
increasing SKB size.  This doesn't currently have a match/tag target in 
netfilter or an analogue in the connection for save/restore.  It does 
however have a pre-existing classifier in tc code.

So this patch adds tc_index field to the connection and 
match/tag/save/restore targets to netfilter, allowing marking packets 
for tc into this field and save/restore from the connection.--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html