Re: [PATCH net-next 8/8] netfilter: flowtable: add hardware offload tracepoints

Vlad Buslov <vladbu@xxxxxxxxxx> · Tue, 15 Mar 2022 18:36:50 +0200

On Tue 15 Mar 2022 at 11:29, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> On Sat, Mar 12, 2022 at 10:05:55PM +0200, Vlad Buslov wrote:
>> 
>> On Mon 07 Mar 2022 at 23:49, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
>> > On Tue, Feb 22, 2022 at 05:10:03PM +0200, Vlad Buslov wrote:
>> >> Add tracepoints to trace creation and start of execution of flowtable
>> >> hardware offload 'add', 'del' and 'stats' tasks. Move struct
>> >> flow_offload_work from source into header file to allow access to structure
>> >> fields from tracepoint code.
>> >
>> > This patch, I would prefer to keep it back and explore exposing trace
>> > infrastructure for the flowtable through netlink.
>> >
>> 
>> What approach do you have in mind with netlink? I used tracepoints here
>> because they are:
>> 
>> - Incur no performance penalty when disabled.
>> 
>> - Handy to attach BPF programs to.
>> 
>> According to my experience with optimizing TC control path parsing
>> Netlink is CPU-intensive. I am also not aware of mechanisms to leverage
>> it to attach BPF.
>
> Sure, no question tracing and introspection is useful.
>
> But could you use the generic workqueue trace points instead?

I can. In fact, this is exactly what I use to implement such scripts for
current upstream:

tracepoint:workqueue:workqueue_queue_work
/ str(args->workqueue) == "nf_ft_offload_add" /
{
    ...
}

However note that such approach:

1. Requires knowledge of kernel infrastructure internals. We would like
to make it accessible to more users than just kernel hackers.

2. Is probably slower due to string comparison. I didn't benchmark CPU
usage of scripts that rely on workqueue tracepoints vs their
re-implementation using new dedicated tracepoints from this patch
though.

>
> This is adding tracing infrastructure for a very specific purpose, to
> inspect the workqueue behaviour for the flowtable.
>
> And I am not sure how you use this yet other than observing that the
> workqueue is coping with the workload?

Well, there are multiple different metrics that can constitute "coping".
Besides measuring workqueue size we are also interested in task
processing latency histogram, current workqueue task creation rate vs
task processing rate, etc. We could probably implement all of these
without any tracepoints at all by using just kprobes, but such programs
would be much more complicated and fragile.