Re: [PATCH bpf 5/5] flow_dissector: document BPF flow dissector environment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 1, 2019 at 1:57 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
>
> Short doc on what BPF flow dissector should expect in the input
> __sk_buff and flow_keys.
>
> Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx>
> ---
>  .../networking/bpf_flow_dissector.txt         | 115 ++++++++++++++++++
>  1 file changed, 115 insertions(+)
>  create mode 100644 Documentation/networking/bpf_flow_dissector.txt
>
> diff --git a/Documentation/networking/bpf_flow_dissector.txt b/Documentation/networking/bpf_flow_dissector.txt
> new file mode 100644
> index 000000000000..513be8e20afb
> --- /dev/null
> +++ b/Documentation/networking/bpf_flow_dissector.txt
> @@ -0,0 +1,115 @@
> +==================
> +BPF Flow Dissector
> +==================
> +
> +Overview
> +========
> +
> +Flow dissector is a routine that parses metadata out of the packets. It's
> +used in the various places in the networking subsystem (RFS, flow hash, etc).
> +
> +BPF flow dissector is an attempt to reimplement C-based flow dissector logic
> +in BPF to gain all the benefits of BPF verifier (namely, limits on the
> +number of instructions and tail calls).
> +
> +API
> +===
> +
> +BPF flow dissector programs operate on an __sk_buff. However, only the
> +limited set of fields is allowed: data, data_end and flow_keys. flow_keys
> +is 'struct bpf_flow_keys' and contains flow dissector input and
> +output arguments.
> +
> +The inputs are:
> +  * nhoff - initial offset of the networking header
> +  * thoff - initial offset of the transport header, initialized to nhoff
> +  * n_proto - L3 protocol type, parsed out of L2 header
> +
> +Flow dissector BPF program should fill out the rest of the 'struct
> +bpf_flow_keys' fields. Input arguments nhoff/thoff/n_proto should be also
> +adjusted accordingly.
> +
> +The return code of the BPF program is either BPF_OK to indicate successful
> +dissection, or BPF_DROP to indicate parsing error.
I don't think this is actually enforced. I believe the current code
just checks if the status is BPF_OK or not, rather than BPF_OK,
BPF_DROP, or neither.

> +
> +__sk_buff->data
> +===============
> +
> +In the VLAN-less case, this is what the initial state of the BPF flow
> +dissector looks like:
> ++------+------+------------+-----------+
> +| DMAC | SMAC | ETHER_TYPE | L3_HEADER |
> ++------+------+------------+-----------+
> +                            ^
> +                            |
> +                            +-- flow dissector starts here
> +
> +skb->data + flow_keys->nhoff point to the first byte of L3_HEADER.
> +flow_keys->thoff = nhoff
> +flow_keys->n_proto = ETHER_TYPE
> +
> +
> +In case of VLAN, flow dissector can be called with the two different states.
> +
> +Pre-VLAN parsing:
> ++------+------+------+-----+-----------+-----------+
> +| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
> ++------+------+------+-----+-----------+-----------+
> +                      ^
> +                      |
> +                      +-- flow dissector starts here
> +
> +skb->data + flow_keys->nhoff point the to first byte of TCI.
> +flow_keys->thoff = nhoff
> +flow_keys->n_proto = TPID
> +
> +Please note that TPID can be 802.1AD and, hence, BPF program would
> +have to parse VLAN information twice for double tagged packets.
> +
> +
> +Post-VLAN parsing:
> ++------+------+------+-----+-----------+-----------+
> +| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
> ++------+------+------+-----+-----------+-----------+
> +                                        ^
> +                                        |
> +                                        +-- flow dissector starts here
> +
> +skb->data + flow_keys->nhoff point the to first byte of L3_HEADER.
> +flow_keys->thoff = nhoff
> +flow_keys->n_proto = ETHER_TYPE
> +
> +In this case VLAN information has been processed before the flow dissector
> +and BPF flow dissector is not required to handle it.
> +
> +
> +The takeaway here is as follows: BPF flow dissector program can be called with
> +the optional VLAN header and should gracefully handle both cases: when single
> +or double VLAN is present and when it is not present. The same program
> +can be called for both cases and would have to be written carefully to
> +handle both cases.
> +
> +
> +Reference Implementation
> +========================
> +
> +See tools/testing/selftests/bpf/progs/bpf_flow.c for the reference
> +implementation and tools/testing/selftests/bpf/flow_dissector_load.[hc] for
> +the loader. bpftool can be used to load BPF flow dissector program as well.
> +
> +The reference implementation is organized as follows:
> +* jmp_table map that contains sub-programs for each supported L3 protocol
> +* _dissect routine - entry point; it does input n_proto parsing and does
> +  bpf_tail_call to the appropriate L3 handler
> +
> +Since BPF at this point doesn't support looping (or any jumping back),
> +jmp_table is used instead to handle multiple levels of encapsulation (and
> +IPv6 options).
> +
> +
> +Current Limitations
> +===================
> +BPF flow dissector doesn't support exporting all the metadata that in-kernel
> +C-based implementation can export. Notable example is single VLAN (802.1Q)
> +and double VLAN (802.1AD) tags. Please refer to the 'struct bpf_flow_keys'
> +for a set of information that's currently can be exported from the BPF context.
> --
> 2.21.0.392.gf8f6787159e-goog
>



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux