On 03/26, Alexei Starovoitov wrote: > On Tue, Mar 26, 2019 at 11:54:56AM -0700, Stanislav Fomichev wrote: > > On 03/26, Alexei Starovoitov wrote: > > > On Tue, Mar 26, 2019 at 11:17:19AM -0700, Stanislav Fomichev wrote: > > > > On 03/26, Alexei Starovoitov wrote: > > > > > On Tue, Mar 26, 2019 at 10:52 AM Willem de Bruijn > > > > > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > > > The BPF flow dissector should work the same. It is fine to pass the > > > > > > data including ethernet header, but parsing can start at nhoff with > > > > > > proto explicitly passed. > > > > > > > > > > > > We should not assume Ethernet link layer. > > > > > > > > > > then skb-less dissector has to be different program type > > > > > because semantics are different. > > > > The semantics are the same as for c-based __skb_flow_dissect. > > > > We just need to pass nhoff and proto that has been passed to > > > > __skb_flow_dissect to the bpf program. In case of with-skb, > > > > take this initial data from skb, like __skb_flow_dissect does (and don't > > > > ask BPF program to do it essentially): > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/net/core/flow_dissector.c#n763 > > > > > > > > I was thinking of passing proto as flow_keys->n_proto and we already > > > > pass flow_keys->nhoff, so no need to do anything for it. With that, > > > > BPF program doesn't need to look into skb and can parse optional vlan > > > > and L3+ headers. The same way __skb_flow_dissect does that. > > > > > > makes sense. then I'd also prefer for proto to be in flow_keys to > > > high light this difference. > > Maybe rename existing flow_keys->n_proto to flow_keys->proto? > > That would match __skb_flow_dissect and remove ambiguity with both proto > > and n_proto in flow_keys. > > disabling useless fields in ctx is one thing, since probability of breaking users > is low, but renaming n_proto is imo too much. > > > > may be add vlan_proto/present/tci there as well? > > > At least on the kernel side ctx rewriter will be the same for w/ & w/o skb cases. > > Why do you think we need them? My understanding was that when > > skb_vlan_tag_present(skb) (or skb->vlan_present) returns true, that means > > that vlan info has been already parsed out of the packet and stored in > > the vlan_tci/vlan_proto (where vlan_proto is 8021Q/8021AD); skb data > > points to proper L3 header. > > > > If that's correct, BPF flow dissector should not care about that. For > > example, look at how C-based flow dissector does that: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/net/core/flow_dissector.c#n944 > > > > If skb_vlan_tag_present(skb) returns true, we set proto to skb->protocol > > and move on. > > > > But, we would need vlan_proto/present/tci in the flow_keys in the future. > > We don't currently return parsed vlan data from the BPF flow dissector. > > But it feels like it's getting into bpf-next territory :-) > > Whether ctx->data points to L2 or L3 is uapi regardless whether > progs/bpf_flow.c is relying on that or not. > So far I think you're saying that in all three cases: > no-skb, skb befor rfs, skb after rfs ctx->data points to L2, right? > This has to be preserved. It points to L3 (or vlan). And this will be preserved, I have no intention to change that. Just to make sure, we are on the same page, here is what __skb_flow_dissect (and BPF prog) is seeing in nhoff. NO-VLAN is always the same for both with-skb/no-skb: +----+----+-----+--+ |DMAC|SMAC|PROTO|L3| +----+----+-----+--+ ^ +-- nhoff proto = PROTO VLAN no-skb (eth_get_headlen): +----+----+----+---+-----+--+ |DMAC|SMAC|TPID|TCI|PROTO|L3| +----+----+----+---+-----+--+ ^ +-- nhoff proto = TPID VLAN with-skb, RFS (pre __netif_receive_skb_core): +----+----+----+---+-----+--+ |DMAC|SMAC|TPID|TCI|PROTO|L3| +----+----+----+---+-----+--+ ^ +-- nhoff proto = TPID VLAN with-skb, post RFS (post __netif_receive_skb_core / skb_vlan_untag): +----+----+----+---+-----+--+ |DMAC|SMAC|TPID|TCI|PROTO|L3| +----+----+----+---+-----+--+ ^ +-- nhoff proto = PROTO And in the last case, networking stack sets: * skb->vlan_present to true * skb->vlan_proto to TPID * skb->vlan_tci to TCI * skb->protocol to PROTO * pulls vlan header, so skb->data points to L3 header > Only now after reading bpf_flow.c for Nth time I realized what semantics > you gave to skb->vlan* and skb->protocol fields. All of them have > to be kept as-is. Don't read too much into current bpf_flow.c, I don't think it really works with vlans in all the cases :-/ It always looks back, assuming post RFS situation; that needs to be changed by dropping that "if (!skb->vlan_present)" and just looking into input 'proto' (and optionally parsing vlan hdr if proto == 802.1q/ad, which we already, sort of, do). I'm gonna add a small testcase for BPF_PROG_TEST_RUN. > For no-skb cases all of them should be available with the same logic > and it has to documented, since it's different from other bpf progs > that access these fields. I feel like dropping those vlan_{present,proto,tci} from bpf flow dissector. It should not care what's in the skb and should just rely on the input 'proto' to optionally parse vlan header. +1 on documenting all of that