Re: [PATCH 6/6] net: move qdisc ingress filtering on top of netfilter ingress hooks

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> · Thu, 30 Apr 2015 01:32:05 +0200

On Wed, Apr 29, 2015 at 10:27:05PM +0200, Daniel Borkmann wrote:
> On 04/29/2015 08:53 PM, Pablo Neira Ayuso wrote:
> >Port qdisc ingress on top of the Netfilter ingress allows us to detach the
> >qdisc ingress filtering code from the core, so now it resides where it really
> >belongs.
> 
> Hm, but that means, in case you have a tc ingress qdisc attached
> with one single (ideal) or more (less ideal) classifier/actions,
> the path we _now_ have to traverse just to a single tc classifier
> invocation is, if I spot this correctly, f.e.:
> 
>  __netif_receive_skb_core()
>  `-> nf_hook_ingress()
>   `-> nf_hook_do_ingress()
>    `-> nf_hook_slow()
>     `-> [for each entry in hook list]
>      `-> nf_iterate()
>       `-> (*elemp)->hook()
>        `-> handle_ing()
>         `-> ing_filter()
>          `-> qdisc_enqueue_root()
>           `-> sch->enqueue()
>            `-> ingress_enqueue()
>             `-> tc_classify()
>              `-> tc_classify_compat()
>               `-> [for each attached classifier]
>                `-> tp->classify()
>                 `-> f.e. cls_bpf_classify()
>                  `-> [for each classifier from plist]
>                   `-> BPF_PROG_RUN()

Actually, the extra cost is roughly (getting inlined stuff away and
other non-relevant stuff):

    `-> nf_hook_slow()
     `-> [for each entry in hook list]
      `-> nf_iterate()
       `-> (*elemp)->hook()

as part of the generic hook infrastructure, which comes with extra
flexibility in return. I think the main concern so far was not to harm
the critical netif_receive_core() path, and this patchset proves not
to affect this.

BTW, the sch->enqueue() can easily go away after this patchset, see
attached patch.

> What was actually mentioned in the other thread where we'd like to
> see a more lightweight ingress qdisc is to cut that down tremendously
> to increase pps rate, as provided, that we would be able to process
> a path roughly like:
> 
>  __netif_receive_skb_core()
>  `-> tc_classify()
>   `-> tc_classify_compat()
>     `-> [for each attached classifier]
>       `-> tp->classify()
>         `-> f.e. cls_bpf_classify()
>           `-> [for each classifier from plist]
>             `-> BPF_PROG_RUN()
> 
> Therefore, I think it would be better to not wrap that ingress qdisc
> part of the patch set into even more layers. What do you think?

I think the main front to improve performance in qdisc ingress is to
remove the central spinlock that is harming scalability. There's also
the built-in rule counters there that look problematic. So I would
focus on improving performance from the qdisc ingress core
infrastructure itself.

On the bugfix front, the illegal mangling of shared skb from actions
like stateless nat and bpf look also important to be addressed to me.
David already suggested to propagate some state object that keeps a
pointer to the skb that is passed to the action. Thus, the action can
clone it and get the skb back to the ingress path. I started a
patchset to do so here, it's a bit large since it requires quite a lot
of function signature adjustment.

I can also see there were also intentions to support userspace
queueing at some point since TC_ACT_QUEUED has been there since the
beginning.  That should be possible at some point using this
infrastructure (once there are no further concerns on the
netif_receive_core_finish() patch as soon as gcc 4.9 and follow up
versions keep inlining this new function).
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html