Martin KaFai Lau <martin.lau@xxxxxxxxx> writes: > On 12/8/23 2:15 AM, Toke Høiland-Jørgensen wrote: >> Martin KaFai Lau <martin.lau@xxxxxxxxx> writes: >> >>> On 12/1/23 10:29 AM, Jamal Hadi Salim wrote: >>>> We add an initial set of kfuncs to allow interactions from eBPF programs >>>> to the P4TC domain. >>>> >>>> - bpf_p4tc_tbl_read: Used to lookup a table entry from a BPF >>>> program installed in TC. To find the table entry we take in an skb, the >>>> pipeline ID, the table ID, a key and a key size. >>>> We use the skb to get the network namespace structure where all the >>>> pipelines are stored. After that we use the pipeline ID and the table >>>> ID, to find the table. We then use the key to search for the entry. >>>> We return an entry on success and NULL on failure. >>>> >>>> - xdp_p4tc_tbl_read: Used to lookup a table entry from a BPF >>>> program installed in XDP. To find the table entry we take in an xdp_md, >>>> the pipeline ID, the table ID, a key and a key size. >>>> We use struct xdp_md to get the network namespace structure where all >>>> the pipelines are stored. After that we use the pipeline ID and the table >>>> ID, to find the table. We then use the key to search for the entry. >>>> We return an entry on success and NULL on failure. >>>> >>>> - bpf_p4tc_entry_create: Used to create a table entry from a BPF >>>> program installed in TC. To create the table entry we take an skb, the >>>> pipeline ID, the table ID, a key and its size, and an action which will >>>> be associated with the new entry. >>>> We return 0 on success and a negative errno on failure >>>> >>>> - xdp_p4tc_entry_create: Used to create a table entry from a BPF >>>> program installed in XDP. To create the table entry we take an xdp_md, the >>>> pipeline ID, the table ID, a key and its size, and an action which will >>>> be associated with the new entry. >>>> We return 0 on success and a negative errno on failure >>>> >>>> - bpf_p4tc_entry_create_on_miss: conforms to PNA "add on miss". >>>> First does a lookup using the passed key and upon a miss will add the entry >>>> to the table. >>>> We return 0 on success and a negative errno on failure >>>> >>>> - xdp_p4tc_entry_create_on_miss: conforms to PNA "add on miss". >>>> First does a lookup using the passed key and upon a miss will add the entry >>>> to the table. >>>> We return 0 on success and a negative errno on failure >>>> >>>> - bpf_p4tc_entry_update: Used to update a table entry from a BPF >>>> program installed in TC. To update the table entry we take an skb, the >>>> pipeline ID, the table ID, a key and its size, and an action which will >>>> be associated with the new entry. >>>> We return 0 on success and a negative errno on failure >>>> >>>> - xdp_p4tc_entry_update: Used to update a table entry from a BPF >>>> program installed in XDP. To update the table entry we take an xdp_md, the >>>> pipeline ID, the table ID, a key and its size, and an action which will >>>> be associated with the new entry. >>>> We return 0 on success and a negative errno on failure >>>> >>>> - bpf_p4tc_entry_delete: Used to delete a table entry from a BPF >>>> program installed in TC. To delete the table entry we take an skb, the >>>> pipeline ID, the table ID, a key and a key size. >>>> We return 0 on success and a negative errno on failure >>>> >>>> - xdp_p4tc_entry_delete: Used to delete a table entry from a BPF >>>> program installed in XDP. To delete the table entry we take an xdp_md, the >>>> pipeline ID, the table ID, a key and a key size. >>>> We return 0 on success and a negative errno on failure >>> >>> [ ... ] >>> >>>> +BTF_SET8_START(p4tc_kfunc_check_tbl_set_skb) >>>> +BTF_ID_FLAGS(func, bpf_p4tc_tbl_read, KF_RET_NULL); >>>> +BTF_ID_FLAGS(func, bpf_p4tc_entry_create); >>>> +BTF_ID_FLAGS(func, bpf_p4tc_entry_create_on_miss); >>>> +BTF_ID_FLAGS(func, bpf_p4tc_entry_update); >>>> +BTF_ID_FLAGS(func, bpf_p4tc_entry_delete); >>>> +BTF_SET8_END(p4tc_kfunc_check_tbl_set_skb) >>> >>> These create/read/update/delete kfuncs are like defining a new hidden bpf map >>> type in the kernel. bpf prog can now create its own link-list and rbtree. >>> sched_ext has already been using it. This is the way the bpf prog should use >>> instead of creating a new map type. >> >> I don't really think this is an accurate assessment, given Jamal's use >> case. These kfuncs are more akin to the FIB lookup helper, or the >> netfilter kfuncs: they provide lookup into a kernel-internal data >> structure, so that BPF can access that data structure while staying in >> sync with the rest of the kernel. >> >> If this was a BPF-only implementation you'd be right, but given the >> constraint of having the P4 objects represented in the kernel[0], I >> think this is a perfectly reasonable use of kfuncs, even though they >> happen to look like the map API. >> >> -Toke >> >> [0] Whether having those objects represented at all is reasonable is a >> separate discussion, which I believe John et al are having with Jamal in >> a separate subthread. I don't personally have any strong objections to >> doing that. > > I might not be clear. It was my question on why it has to be in the kernel > instead of in the bpf map, so the earlier bpf link-list and rbtree example just > in case this recent bpf capability has not been considered. A bit tangential, but it came to mind while thinking about this: how would one go about updating a bpf rbtree-based data structure from userspace? Is there a way to get bpf_map_update()-semantics that inserts things into the rbtree somehow? > If it is an existing kernel infra-structure, kfunc is a reasonable use. > > The P4 objects are newly added to this set with bpf program as its user. It can > be represented in the bpf map as well instead of in the kernel. > > or is it fair to say that bpf prog is not the primary consumer of the P4 > objects. Instead kernel is the primary user of the p4 objects such that p4tc can > work independently without the bpf piece to begin with and bpf could be > considered as an extension later? That's a good question, actually. I think that conceptually, if viewed purely as a control plane, it could be merged separately and the BPF support added later. But with this series, that would make it a control plane that doesn't really control anything; so there would need to be a second consumer (hardware offload?) added for that to make sense, I suppose. Or to put it another way, the way this series is designed, there is an implicit "these are kernel objects that we want to use for other things" assumption in there; it's just that those "other things" are not part of this series (because hardware offload doesn't exist yet - I think? I'll let Jamal answer that). I can see the point of asking for that second user, though, as that would make it clear why the control plane needs to be in the kernel. -Toke