On Mon, Dec 11, 2023 at 10:00 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > Martin KaFai Lau <martin.lau@xxxxxxxxx> writes: > > > On 12/8/23 2:15 AM, Toke Høiland-Jørgensen wrote: > >> Martin KaFai Lau <martin.lau@xxxxxxxxx> writes: > >> > >>> On 12/1/23 10:29 AM, Jamal Hadi Salim wrote: > >>>> We add an initial set of kfuncs to allow interactions from eBPF programs > >>>> to the P4TC domain. > >>>> > >>>> - bpf_p4tc_tbl_read: Used to lookup a table entry from a BPF > >>>> program installed in TC. To find the table entry we take in an skb, the > >>>> pipeline ID, the table ID, a key and a key size. > >>>> We use the skb to get the network namespace structure where all the > >>>> pipelines are stored. After that we use the pipeline ID and the table > >>>> ID, to find the table. We then use the key to search for the entry. > >>>> We return an entry on success and NULL on failure. > >>>> > >>>> - xdp_p4tc_tbl_read: Used to lookup a table entry from a BPF > >>>> program installed in XDP. To find the table entry we take in an xdp_md, > >>>> the pipeline ID, the table ID, a key and a key size. > >>>> We use struct xdp_md to get the network namespace structure where all > >>>> the pipelines are stored. After that we use the pipeline ID and the table > >>>> ID, to find the table. We then use the key to search for the entry. > >>>> We return an entry on success and NULL on failure. > >>>> > >>>> - bpf_p4tc_entry_create: Used to create a table entry from a BPF > >>>> program installed in TC. To create the table entry we take an skb, the > >>>> pipeline ID, the table ID, a key and its size, and an action which will > >>>> be associated with the new entry. > >>>> We return 0 on success and a negative errno on failure > >>>> > >>>> - xdp_p4tc_entry_create: Used to create a table entry from a BPF > >>>> program installed in XDP. To create the table entry we take an xdp_md, the > >>>> pipeline ID, the table ID, a key and its size, and an action which will > >>>> be associated with the new entry. > >>>> We return 0 on success and a negative errno on failure > >>>> > >>>> - bpf_p4tc_entry_create_on_miss: conforms to PNA "add on miss". > >>>> First does a lookup using the passed key and upon a miss will add the entry > >>>> to the table. > >>>> We return 0 on success and a negative errno on failure > >>>> > >>>> - xdp_p4tc_entry_create_on_miss: conforms to PNA "add on miss". > >>>> First does a lookup using the passed key and upon a miss will add the entry > >>>> to the table. > >>>> We return 0 on success and a negative errno on failure > >>>> > >>>> - bpf_p4tc_entry_update: Used to update a table entry from a BPF > >>>> program installed in TC. To update the table entry we take an skb, the > >>>> pipeline ID, the table ID, a key and its size, and an action which will > >>>> be associated with the new entry. > >>>> We return 0 on success and a negative errno on failure > >>>> > >>>> - xdp_p4tc_entry_update: Used to update a table entry from a BPF > >>>> program installed in XDP. To update the table entry we take an xdp_md, the > >>>> pipeline ID, the table ID, a key and its size, and an action which will > >>>> be associated with the new entry. > >>>> We return 0 on success and a negative errno on failure > >>>> > >>>> - bpf_p4tc_entry_delete: Used to delete a table entry from a BPF > >>>> program installed in TC. To delete the table entry we take an skb, the > >>>> pipeline ID, the table ID, a key and a key size. > >>>> We return 0 on success and a negative errno on failure > >>>> > >>>> - xdp_p4tc_entry_delete: Used to delete a table entry from a BPF > >>>> program installed in XDP. To delete the table entry we take an xdp_md, the > >>>> pipeline ID, the table ID, a key and a key size. > >>>> We return 0 on success and a negative errno on failure > >>> > >>> [ ... ] > >>> > >>>> +BTF_SET8_START(p4tc_kfunc_check_tbl_set_skb) > >>>> +BTF_ID_FLAGS(func, bpf_p4tc_tbl_read, KF_RET_NULL); > >>>> +BTF_ID_FLAGS(func, bpf_p4tc_entry_create); > >>>> +BTF_ID_FLAGS(func, bpf_p4tc_entry_create_on_miss); > >>>> +BTF_ID_FLAGS(func, bpf_p4tc_entry_update); > >>>> +BTF_ID_FLAGS(func, bpf_p4tc_entry_delete); > >>>> +BTF_SET8_END(p4tc_kfunc_check_tbl_set_skb) > >>> > >>> These create/read/update/delete kfuncs are like defining a new hidden bpf map > >>> type in the kernel. bpf prog can now create its own link-list and rbtree. > >>> sched_ext has already been using it. This is the way the bpf prog should use > >>> instead of creating a new map type. > >> > >> I don't really think this is an accurate assessment, given Jamal's use > >> case. These kfuncs are more akin to the FIB lookup helper, or the > >> netfilter kfuncs: they provide lookup into a kernel-internal data > >> structure, so that BPF can access that data structure while staying in > >> sync with the rest of the kernel. > >> > >> If this was a BPF-only implementation you'd be right, but given the > >> constraint of having the P4 objects represented in the kernel[0], I > >> think this is a perfectly reasonable use of kfuncs, even though they > >> happen to look like the map API. > >> > >> -Toke > >> > >> [0] Whether having those objects represented at all is reasonable is a > >> separate discussion, which I believe John et al are having with Jamal in > >> a separate subthread. I don't personally have any strong objections to > >> doing that. > > > > I might not be clear. It was my question on why it has to be in the kernel > > instead of in the bpf map, so the earlier bpf link-list and rbtree example just > > in case this recent bpf capability has not been considered. > > A bit tangential, but it came to mind while thinking about this: how > would one go about updating a bpf rbtree-based data structure from > userspace? Is there a way to get bpf_map_update()-semantics that inserts > things into the rbtree somehow? > > > If it is an existing kernel infra-structure, kfunc is a reasonable use. > > > > The P4 objects are newly added to this set with bpf program as its user. It can > > be represented in the bpf map as well instead of in the kernel. > > > > or is it fair to say that bpf prog is not the primary consumer of the P4 > > objects. Instead kernel is the primary user of the p4 objects such that p4tc can > > work independently without the bpf piece to begin with and bpf could be > > considered as an extension later? > > That's a good question, actually. I think that conceptually, if viewed > purely as a control plane, it could be merged separately and the BPF > support added later. But with this series, that would make it a control > plane that doesn't really control anything; so there would need to be a > second consumer (hardware offload?) added for that to make sense, I > suppose. > > Or to put it another way, the way this series is designed, there is an > implicit "these are kernel objects that we want to use for other things" > assumption in there; it's just that those "other things" are not part > of this series (because hardware offload doesn't exist yet - I think? > I'll let Jamal answer that). I can see the point of asking for that > second user, though, as that would make it clear why the control plane > needs to be in the kernel. Yes, HW is also a consumer via TC but these patches are for the s/w only piece. Martin, to get some understanding, see the last slide on https://netdevconf.info/0x17/sessions/talk/integrating-ebpf-into-the-p4tc-datapath.html If you want to see the history of what changed from V1 for the s/w side, see slide #14. And if you have the patience, look at the whole slide set, and read the abstract. If clarity is still lacking then the cover letter. And if you have even more patience then the relevant commit messages but more importantly we need help to review if we are making any amateur mistakes for the kfunc piece. TC (which is used for such offloads, see u32, flower, etc) has its own model for config/control driven via netlink. There are more objects than just tables (eg pipelines, actions, externs etc) that all are _attached to netns_ where they were instantiated - so we have a few more kfuncs for those objects which are not part of this patchset (since the rules say 15 patches is the max). Hope that helps. cheers, jamal > -Toke >