Re: [PATCH bpf 5/5] flow_dissector: document BPF flow dissector environment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/02, Petar Penkov wrote:
> On Mon, Apr 1, 2019 at 1:57 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
> >
> > Short doc on what BPF flow dissector should expect in the input
> > __sk_buff and flow_keys.
> >
> > Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx>
> > ---
> >  .../networking/bpf_flow_dissector.txt         | 115 ++++++++++++++++++
> >  1 file changed, 115 insertions(+)
> >  create mode 100644 Documentation/networking/bpf_flow_dissector.txt
> >
> > diff --git a/Documentation/networking/bpf_flow_dissector.txt b/Documentation/networking/bpf_flow_dissector.txt
> > new file mode 100644
> > index 000000000000..513be8e20afb
> > --- /dev/null
> > +++ b/Documentation/networking/bpf_flow_dissector.txt
> > @@ -0,0 +1,115 @@
> > +==================
> > +BPF Flow Dissector
> > +==================
> > +
> > +Overview
> > +========
> > +
> > +Flow dissector is a routine that parses metadata out of the packets. It's
> > +used in the various places in the networking subsystem (RFS, flow hash, etc).
> > +
> > +BPF flow dissector is an attempt to reimplement C-based flow dissector logic
> > +in BPF to gain all the benefits of BPF verifier (namely, limits on the
> > +number of instructions and tail calls).
> > +
> > +API
> > +===
> > +
> > +BPF flow dissector programs operate on an __sk_buff. However, only the
> > +limited set of fields is allowed: data, data_end and flow_keys. flow_keys
> > +is 'struct bpf_flow_keys' and contains flow dissector input and
> > +output arguments.
> > +
> > +The inputs are:
> > +  * nhoff - initial offset of the networking header
> > +  * thoff - initial offset of the transport header, initialized to nhoff
> > +  * n_proto - L3 protocol type, parsed out of L2 header
> > +
> > +Flow dissector BPF program should fill out the rest of the 'struct
> > +bpf_flow_keys' fields. Input arguments nhoff/thoff/n_proto should be also
> > +adjusted accordingly.
> > +
> > +The return code of the BPF program is either BPF_OK to indicate successful
> > +dissection, or BPF_DROP to indicate parsing error.
> I don't think this is actually enforced. I believe the current code
> just checks if the status is BPF_OK or not, rather than BPF_OK,
> BPF_DROP, or neither.
It's not universally enforced, but some codepaths in the kernel look at
the returned value (e.g. skb_get_poff and eth_get_headlen), so it's
better to set the expectations :-)

> > +
> > +__sk_buff->data
> > +===============
> > +
> > +In the VLAN-less case, this is what the initial state of the BPF flow
> > +dissector looks like:
> > ++------+------+------------+-----------+
> > +| DMAC | SMAC | ETHER_TYPE | L3_HEADER |
> > ++------+------+------------+-----------+
> > +                            ^
> > +                            |
> > +                            +-- flow dissector starts here
> > +
> > +skb->data + flow_keys->nhoff point to the first byte of L3_HEADER.
> > +flow_keys->thoff = nhoff
> > +flow_keys->n_proto = ETHER_TYPE
> > +
> > +
> > +In case of VLAN, flow dissector can be called with the two different states.
> > +
> > +Pre-VLAN parsing:
> > ++------+------+------+-----+-----------+-----------+
> > +| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
> > ++------+------+------+-----+-----------+-----------+
> > +                      ^
> > +                      |
> > +                      +-- flow dissector starts here
> > +
> > +skb->data + flow_keys->nhoff point the to first byte of TCI.
> > +flow_keys->thoff = nhoff
> > +flow_keys->n_proto = TPID
> > +
> > +Please note that TPID can be 802.1AD and, hence, BPF program would
> > +have to parse VLAN information twice for double tagged packets.
> > +
> > +
> > +Post-VLAN parsing:
> > ++------+------+------+-----+-----------+-----------+
> > +| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
> > ++------+------+------+-----+-----------+-----------+
> > +                                        ^
> > +                                        |
> > +                                        +-- flow dissector starts here
> > +
> > +skb->data + flow_keys->nhoff point the to first byte of L3_HEADER.
> > +flow_keys->thoff = nhoff
> > +flow_keys->n_proto = ETHER_TYPE
> > +
> > +In this case VLAN information has been processed before the flow dissector
> > +and BPF flow dissector is not required to handle it.
> > +
> > +
> > +The takeaway here is as follows: BPF flow dissector program can be called with
> > +the optional VLAN header and should gracefully handle both cases: when single
> > +or double VLAN is present and when it is not present. The same program
> > +can be called for both cases and would have to be written carefully to
> > +handle both cases.
> > +
> > +
> > +Reference Implementation
> > +========================
> > +
> > +See tools/testing/selftests/bpf/progs/bpf_flow.c for the reference
> > +implementation and tools/testing/selftests/bpf/flow_dissector_load.[hc] for
> > +the loader. bpftool can be used to load BPF flow dissector program as well.
> > +
> > +The reference implementation is organized as follows:
> > +* jmp_table map that contains sub-programs for each supported L3 protocol
> > +* _dissect routine - entry point; it does input n_proto parsing and does
> > +  bpf_tail_call to the appropriate L3 handler
> > +
> > +Since BPF at this point doesn't support looping (or any jumping back),
> > +jmp_table is used instead to handle multiple levels of encapsulation (and
> > +IPv6 options).
> > +
> > +
> > +Current Limitations
> > +===================
> > +BPF flow dissector doesn't support exporting all the metadata that in-kernel
> > +C-based implementation can export. Notable example is single VLAN (802.1Q)
> > +and double VLAN (802.1AD) tags. Please refer to the 'struct bpf_flow_keys'
> > +for a set of information that's currently can be exported from the BPF context.
> > --
> > 2.21.0.392.gf8f6787159e-goog
> >



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux