On Thu, Apr 4, 2024 at 8:23 AM Jamal Hadi Salim <jhs@xxxxxxxxxxxx> wrote: > > > This is the first patchset of two. In this patch we are submitting 15 which > cover the minimal viable P4 PNA architecture. > Please, if you want to discuss a slightly tangential subject like offload > or even your politics then start another thread with a different subject > line. The way you do it is to change the subject line to for example > "<Your New Subject here> (WAS: <original subject line here>)". > > In this cover letter i am restoring text i took out in V10 which stated > "our requirements". > > Martin, please look at patch 14 again. The bpf selftests for kfuncs is > sloted for series 2. Paolo, please take a look at 1, 3, 6 for the changes > you suggested. Marcelo, because we made changes to patch 14, I have > removed your reviewed-by. Can you please take another look at that patch? Sorry, Marcelo - you already reviewed and we restored your reviewed-by. cheers, jamal > > __Description of these Patches__ > > These Patches are constrained entirely within the TC domain with very tiny > changes made in patch 1-5. eBPF is used as an infrastructure component for > the software datapath and no changes are made to any eBPF code, only kfuncs > are introduced in patch 14. > > Patch #1 adds infrastructure for per-netns P4 actions that can be created on > as need basis for the P4 program requirement. This patch makes a small > incision into act_api. Patches 2-4 are minimalist enablers for P4TC and have > no effect on the classical tc action (example patch#2 just increases the size > of the action names from 16->64B). > Patch 5 adds infrastructure support for preallocation of dynamic actions > needed for P4. > > The core P4TC code implements several P4 objects. > 1) Patch #6 introduces P4 data types which are consumed by the rest of the > code > 2) Patch #7 introduces the templating API. i.e. CRUD commands for templates > 3) Patch #8 introduces the concept of templating Pipelines. i.e CRUD > commands for P4 pipelines. > 4) Patch #9 introduces the action templates and associated CRUD commands. > 5) Patch #10 introduce the action runtime infrastructure. > 6) Patch #11 introduces the concept of P4 table templates and associated > CRUD commands for tables. > 7) Patch #12 introduces runtime table entry infra and associated CU > commands. > 8) Patch #13 introduces runtime table entry infra and associated RD > commands. > 9) Patch #14 introduces interaction of eBPF to P4TC tables via kfunc. > 10) Patch #15 introduces the TC classifier P4 used at runtime. > > There are a few more patches not in this patchset that deal with externs, > test cases, etc. > > What is P4? > ----------- > > The Programming Protocol-independent Packet Processors (P4) is an open > source, domain-specific programming language for specifying data plane > behavior. > > The current P4 landscape includes an extensive range of deployments, > products, projects and services, etc[9][12]. Two major NIC vendors, > Intel[10] and AMD[11] currently offer P4-native NICs. P4 is currently > curated by the Linux Foundation[9]. > > A lot more on why P4 - see small treatise here:[4]. > > What is P4TC? > ------------- > > P4TC is a net-namespace aware P4 implementation over TC; meaning, a P4 > program and its associated objects and state are attachend to a kernel > _netns_ structure. > IOW, if we had two programs across netns' or within a netns they have no > visibility to each others objects (unlike for example TC actions whose > kinds are "global" in nature or eBPF maps visavis bpftool). > > P4TC builds on top of many years of Linux TC experiences of a netlink > control path interface coupled with a software datapath with an equivalent > offloadable hardware datapath. In this patch series we are focussing only > on the s/w datapath. The s/w and h/w path equivalence that TC provides is > relevant for a primary use case of P4 where some (currently) large consumers > of NICs provide vendors their datapath specs in P4. In such a case one could > generate specified datapaths in s/w and test/validate the requirements > before hardware acquisition(example [12]). > > Unlike other approaches such as TC Flower which require kernel and user > space changes when new datapath objects like packet headers are introduced > P4TC requires zero kernel or user space changes. We refer to this as: > _kernel and user space code change independence_. > Meaning: > A P4 program describes headers, how to parse, etc alongside prescribing > the datapath processing logic; the compiler uses the P4 program as input > and generates several artifacts which are then loaded into the kernel to > manifest the intended datapath. In addition to the generated datapath, > control path constructs are generated. The process is described further > below in "P4TC Workflow". > > Some History > ------------ > > There have been many discussions and meetings within the community since > about 2015 in regards to P4 over TC[2] and we are finally proving to the > naysayers that we do get stuff done! > > A lot more of the P4TC motivation is captured at: > https://github.com/p4tc-dev/docs/blob/main/why-p4tc.md > > __P4TC Architecture__ > > The current architecture was described at netdevconf 0x17[14] and if you > prefer academic conference papers, a short paper is available here[15]. > > There are 4 parts: > > 1) A Template CRUD provisioning API for manifesting a P4 program and its > associated objects in the kernel. The template provisioning API uses > netlink. See patch in part 2. > > 2) A Runtime CRUD+ API code which is used for controlling the different > runtime behavior of the P4 objects. The runtime API uses netlink. See notes > further down. See patch descriptions... > > 3) P4 objects and their control interfaces: tables, actions, externs, etc. > Any object that requires control plane interaction resides in the TC domain > and is subject to the CRUD runtime API. The intended goal is to make use > of the tc semantics of skip_sw/hw to target P4 program objects either in s/w > or h/w. > > 4) S/W Datapath code hooks. The s/w datapath is eBPF based and is generated > by a compiler based on the P4 spec. When accessing any P4 object that > requires control plane interfaces, the eBPF code accesses the P4TC side > from #3 above using kfuncs. > > The generated eBPF code is derived from [13] with enhancements and fixes to > meet our requirements. > > __P4TC Workflow__ > > The Development and instantiation workflow for P4TC is as follows: > > A) A developer writes a P4 program, "myprog" > > B) Compiles it using the P4C compiler[8]. The compiler generates 3 > outputs: > > a) A shell script which form template definitions for the different P4 > objects "myprog" utilizes (tables, externs, actions etc). See #1 > above > > b) The parser and the rest of the datapath are generated as eBPF and > need to be compiled into binaries. At the moment the parser and the > main control block are generated as separate eBPF program but this > could change in the future (without affecting any kernel code). > See #4 above. > > c) A json introspection file used for the control plane > (by iproute2/tc). > > C) At this point the artifacts from #1,#4 could be handed to an operator > (the operator could be the same person as the developer from #A, #B). > > i) For the eBPF part, either the operator is handed an ebpf binary or > source which they compile at this point into a binary. > The operator executes the shell script(s) to manifest the functional > "myprog" into the kernel. > > ii) The operator instantiates "myprog" pipeline via the tc P4 filter > to ingress/egress (depending on P4 arch) of one or more netdevs/ports > (illustrated below as "block 22"). > > Example instantion where the parser is a separate action: > "tc filter add block 22 ingress protocol all prio 10 \ > p4 pname myprog \ > action bpf obj $PARSER.o section p4tc/parse \ > action bpf obj $PROGNAME.o section p4tc/main" > > See individual patches in partc for more examples tc vs xdp etc. Also see > section on "challenges" (further below on this cover letter). > > Once "myprog" P4 program is instantiated one can start performing operations > on table entries and/or actions at runtime as described below. > > __P4TC Runtime Control Path__ > > The control interface builds on past tc experience and tries to get things > right from the beginning (example filtering is separated from depending > on existing object TLVs and made generic); also the code is written in > such a way it is mostly lockless. > > The P4TC control interface, using netlink, provides what we call a CRUDPS > abstraction which stands for: Create, Read(get), Update, Delete, Subscribe, > Publish. From a high level PoV the following describes a conformant high > level API (both on netlink data model and code level): > > Create(</path/to/object, DATA>+) > Read(</path/to/object>, [optional filter]) > Update(</path/to/object>, DATA>+) > Delete(</path/to/object>, [optional filter]) > Subscribe(</path/to/object>, [optional filter]) > > Note, we _dont_ treat "dump" or "flush" as speacial. If "path/to/object" > points to a table then a "Delete" implies "flush" and a "Read" implies dump > but if it points to an entry (by specifying a key) then "Delete" implies > deleting and entry and "Read" implies reading that single entry. It should > be noted that both "Delete" and "Read" take an optional filter parameter. > The filter can define further refinements to what the control plane wants > read or deleted. > "Subscribe" uses built in netlink event management. It, as well, takes a > filter which can further refine what events get generated to the control > plane (taken out of this patchset, to be re-added with consideration of > [16]). > > Lets show some runtime samples: > > ..create an entry, if we match ip address 10.0.1.2 send packet out eno1 > tc p4ctrl create myprog/table/mytable \ > dstAddr 10.0.1.2/32 action send_to_port param port eno1 > > ..Batch create entries > tc p4ctrl create myprog/table/mytable \ > entry dstAddr 10.1.1.2/32 action send_to_port param port eno1 \ > entry dstAddr 10.1.10.2/32 action send_to_port param port eno10 \ > entry dstAddr 10.0.2.2/32 action send_to_port param port eno2 > > ..Get an entry (note "read" is interchangeably used as "get" which is a > common semantic in tc): > tc p4ctrl read myprog/table/mytable \ > dstAddr 10.0.2.2/32 > > ..dump mytable > tc p4ctrl read myprog/table/mytable > > ..dump mytable for all entries whose key fits within 10.1.0.0/16 > tc p4ctrl read myprog/table/mytable \ > filter key/myprog/mytable/dstAddr = 10.1.0.0/16 > > ..dump all mytable entries which have an action send_to_port with param "eno1" > tc p4ctrl get myprog/table/mytable \ > filter param/act/myprog/send_to_port/port = "eno1" > > The filter expression is powerful, f.e you could say: > > tc p4ctrl get myprog/table/mytable \ > filter param/act/myprog/send_to_port/port = "eno1" && \ > key/myprog/mytable/dstAddr = 10.1.0.0/16 > > It also works on built in metadata, example in the following case dumping > entries from mytable that have seen activity in the last 10 secs: > tc p4ctrl get myprog/table/mytable \ > filter msecs_since < 10000 > > Delete follows the same syntax as get/read, so for sake of brevity we won't > show more example than how to flush mytable: > > tc p4ctrl delete myprog/table/mytable > > Mystery question: How do we achieve iproute2-kernel independence and > how does "tc p4ctrl" as a cli know how to program the kernel given an > arbitrary command line as shown above? Answer(s): It queries the > compiler generated json file in "P4TC Workflow" #B.c above. The json file > has enough details to figure out that we have a program called "myprog" > which has a table "mytable" that has a key name "dstAddr" which happens to > be type ipv4 address prefix. The json file also provides details to show > that the table "mytable" supports an action called "send_to_port" which > accepts a parameter "port" of type netdev (see the types patch for all > supported P4 data types). > All P4 components have names, IDs, and types - so this makes it very easy > to map into netlink. > Once user space tc/p4ctrl validates the human command input, it creates > standard binary netlink structures (TLVs etc) which are sent to the kernel. > See the runtime table entry patch for more details. > > __P4TC Datapath__ > > The P4TC s/w datapath execution is generated as eBPF. Any objects that > require control interfacing reside in the "P4TC domain" and are controlled > via netlink as described above. Per packet execution and state and even > objects that do not require control interfacing (like the P4 parser) are > generated as eBPF. > > A packet arriving on s/w ingress of any of the ports on block 22 > (illustrated in section "P4TC Workflow" above will first be exercised via > the (generated eBPF) parser component to extract the headers (the ip > destination address labeled "dstAddr" above in section "P4TC Runtime > Control Path"). The datapath then proceeds to use "dstAddr", table ID > and pipeline ID as a key to do a lookup in myprog's "mytable" which returns > the action params which are then used to execute the action in the eBPF > datapath (eventually sending out packets to eno1). > On a table miss, mytable's default miss action (not described) is executed. > > __Testing__ > > Speaking of testing - we have 2-300 tdc test cases (which will be in the > second patchset). > These tests are run on our CICD system on pull requests and after commits > are approved. The CICD does a lot of other tests (more since v2, thanks to > Simon's input)including: > checkpatch, sparse, smatch, coccinelle, 32 bit and 64 bit builds tested on > both X86, ARM 64 and emulated BE via qemu s390. We trigger performance > testing in the CICD to catch performance regressions (currently only on > the control path, but in the future for the datapath). > Syzkaller runs 24/7 on dedicated hardware, originally we focussed only on > memory sanitizer but recently added support for concurrency sanitizer. > Before main releases we ensure each patch will compile on its own to help > in git bisect and run the xmas tree tool. We eventually put the code via > coverity. > > In addition we are working on enabling a tool that will take a P4 program, > run it through the compiler, and generate permutations of traffic patterns > via symbolic execution that will test both positive and negative datapath > code paths. The test generator tool integration is still work in progress. > Also: We have other code that test parallelization etc which we are trying > to find a fit for in the kernel tree's testing infra. > > __Restating Our Requirements__ > > Given this code is not intrusive at all because it only touches TC. > We would like to emphasize that we see eBPF as _infrastructure tooling > available to us and not the end goal_. Please help us with technical input > on for example how we can do better kfuncs, etc. If you want to critique, > then our requirements should be your guide and please be considerate that > this is about P4, not eBPF. IOW: > We would appreciate technical commentary instead of bikeshedding on how > _you_ would have implemented this probably with more eBPF or some other > clever tricks. It is sad to see there was zero input from anyone in the eBPF > world for 7 RFC postings (in a period of 9 months). > If i am ranting here is because we have spent over a year now on this > topic - we have taken the initial input and have given you eBPF. So lets > make progress please. > > The initial release was presented in October 2022[20] and RFC in January > 2023 had a "scriptable" datapath (the idea built on the u32 classifier[17] > and pedit action[18] approach. Post RFC V1, we made changes to fit the > feedback to integrate eBPF to replace the "scriptable" software datapath. > On our part, the goal for the change was to meet folks in the middle as a > compromise. > No regrets on the journey since after all the effort because we ended > getting XDP which was not in the original picture. Some of our efforts are > captured at [1][3] and in the patch history. > > In this section we review the original scriptable version against the > current implementation which uses eBPF and in the process re-enumerate our > requirements. > > To be very clear: Our intention for P4TC is to target _the TC crowd_. > Essentially developers and ops people already familiar and deploying TC > based infra. > More importantly the original intent for P4TC was to enable _ops folks_ > more than devs (given code is being generated and doesn't need humans to > write it). > > With TC, we gain the whole "familiar" package of match-action pipeline > abstraction++, meaning from the control plane(see discussion above) all > the way to the tooling infra, i.e iproute2/tc cli, netlink infra interface > (request/response, event subscribe/multicast-publish, congestion control > etc), s/w and h/w symbiosis, the autonomous kernel control, etc. > The main advantage over vendor specific implementations(which is the current > alternative) is: with P4TC we have a singular vendor-neutral interface via > the kernel using well understood mechanisms that have gained learnings from > deployment experience. > > So lets list some of these requirements and compare whether moving to eBPF > affected us or gave us an advantage. > > 0) Understood Control Plane semantics > > This requirement is unaffected. > The control plane remains as netlink and therefore we get the classical > multi-user CRUD+Publish/subscribe APIs built in. > > 1) Must support SW/HW equivalence > > This requirement is unaffected. The control plane is netlink. Any semantics > to select between sw and hw via skip_sw/hw semantics is maintained. > > 2) Supporting expressibility of the universe set of P4 progs > > It is a must to support 100% of all possible P4 programs. In the past the > eBPF verifier, for example in [13], had to be worked around and even then > there are cases where we couldnt avoid path explosion when branching isi > involved and failed to run. So we were skeptical about using eBPF to begin > with. > Kfuncs changed our minds. Note, there are still challenges running all > potential P4 programs at the XDP level - but the pipeline could be split > between XDP and TC in such cases. The compiler can be told to generate > pieces that run on XDP and other on TC (see examples). > Summary: This requirement is unaffected. > > 3) Operational usability > > By maintaining the TC control plane (even in presence of eBPF datapath) > runtime aspects remain unchanged. So for our target audience of folks > who have deployed tc, including offloads, the comfort zone is unchanged. > > There is some loss in operational usability because we now have more knobs: > the extra compilation, loading and syncing of ebpf binaries, etc. > IOW, I can no longer just ship someone a shell script(ascii) in an email to > someone and say "go run this and "myprog" will just work". > > 4) Operational and development Debuggability > > If something goes wrong, the tc craftsperson is now required to have > additional knowledge of eBPF code and process. > Our intent is to compensate this challenge with debug tools that ease the > craftperson's debugging. > > 5) Opportunity for rapid prototyping of new ideas > > This is not exactly a requirement but something that became a useful > feature during the P4TC development phase. When the compiler was lagging > behind in features was to often handcode the template scripts. > Then you would dump back the template from the kernel and do a diff to > ensure the kernel didn't get something wrong. Essentially, this was a nice > debug feature. During development, we wrote scripts that covered a range of > P4 architectures(PSA, V1, etc) which required no kernel code changes. > > Over time the debug feature morphed into: a) start by handcoding scripts > then b) read it back and then c) generate the P4 code. > It means one could start with the template scripts outside of the > constraints of a P4 architecture spec(PNA/PSA) or even within a P4 > architecture then test some ideas and eventually feed back the concepts to > the compiler authors or modify or create a new P4 architecture and share > with the P4 standards folks. > > To summarize in presence of eBPF: The debugging idea is probably still > alive. One could dump, with proper tooling(bpftool for example), the > loaded eBPF code and be able to check for differences. But this is not the > interesting part. > The concept of going back from whats in the kernel to P4 is a lot more > difficult to implement mostly due to scoping of DSL vs general purpose. It > may be lost. We have been discussing ways to use BTF and embedding > annotations in the eBPF code and binary but more thought is required and we > welcome suggestions. > > 6) Supporting per namespace program > > In P4TC every program and its associated objects have unique IDs which are > generated by the compiler. Multiple or the same P4 program(s) can run > independently in different namespaces alongside their appropriate state and > object instance parameterization (despite name or ID collission). > This requirement is still met (by virtue of keeping P4 program control > objects within the TC domain and attaching to a netns). > > __References__ > > [1]https://github.com/p4tc-dev/docs/blob/main/p4-conference-2023/2023P4WorkshopP4TC.pdf > [2]https://github.com/p4tc-dev/docs/blob/main/why-p4tc.md#historical-perspective-for-p4tc > [3]https://2023p4workshop.sched.com/event/1KsAe/p4tc-linux-kernel-p4-implementation-approaches-and-evaluation > [4]https://github.com/p4tc-dev/docs/blob/main/why-p4tc.md#so-why-p4-and-how-does-p4-help-here > [5]https://lore.kernel.org/netdev/20230517110232.29349-3-jhs@xxxxxxxxxxxx/T/#mf59be7abc5df3473cff3879c8cc3e2369c0640a6 > [6]https://lore.kernel.org/netdev/20230517110232.29349-3-jhs@xxxxxxxxxxxx/T/#m783cfd79e9d755cf0e7afc1a7d5404635a5b1919 > [7]https://lore.kernel.org/netdev/20230517110232.29349-3-jhs@xxxxxxxxxxxx/T/#ma8c84df0f7043d17b98f3d67aab0f4904c600469 > [8]https://github.com/p4lang/p4c/tree/main/backends/tc > [9]https://p4.org/ > [10]https://www.intel.com/content/www/us/en/products/details/network-io/ipu/e2000-asic.html > [11]https://www.amd.com/en/accelerators/pensando > [12]https://github.com/sonic-net/DASH/tree/main > [13]https://github.com/p4lang/p4c/tree/main/backends/ebpf > [14]https://netdevconf.info/0x17/sessions/talk/integrating-ebpf-into-the-p4tc-datapath.html > [15]https://dl.acm.org/doi/10.1145/3630047.3630193 > [16]https://lore.kernel.org/netdev/20231216123001.1293639-1-jiri@xxxxxxxxxxx/ > [17.a]https://netdevconf.info/0x13/session.html?talk-tc-u-classifier > [17.b]man tc-u32 > [18]man tc-pedit > [19] https://lore.kernel.org/netdev/20231219181623.3845083-6-victor@xxxxxxxxxxxx/T/#m86e71743d1d83b728bb29d5b877797cb4942e835 > [20.a] https://netdevconf.info/0x16/sessions/talk/your-network-datapath-will-be-p4-scripted.html > [20.b] https://netdevconf.info/0x16/sessions/workshop/p4tc-workshop.html > > -------- > HISTORY > -------- > > Changes in Version 14 > ---------------------- > 1) #UNDEF HWRITE/HREAD and remove unnecessary checks (Paolo) > 2) Remove const cast added in v13 as a result of changes suggested > suggested by Paolo (Marcelo) > 3) Introduce type validate for s8 caught as a result of audit from #1 > 4) S/GFP_KERNEL/GFP_KERNEL_ACCOUNT for types and runtime objects (Paolo) > 5) Syzkaller caught an invalid netlink attribute bug that has existed > since v5! As noted in patch0 we've been running syzkaller for months. > 6) Add Marcelo's reviewed-by for patch 14 and Toke's ACK to the series. > > Changes in Version 13 > ---------------------- > > 1) Remove ops->print() from p4 types (Paolo). > > 2) Use mutex instead of rwlock for dynamic actions since rwlock is > discouraged these days(Paolo). > > 3) Constify action init_ops() ops parameter (Paolo). > > 4) Use struct sk_buff in kfunc instead of struct __sk_buff (Martin) > Use struct xdp_buff in kfunc instead of struct xdp_md (Martin) > > 5) Replace BTF_SET8_START with BTF_KFUNCS_START and replace > BTF_SET8_END with BTF_KFUNCS_END (Martin) > > 6) Add params__sz argument to all kfuncs to guard against future change > to parameter structures being passed between bpf and tc. For kfunc > xdp/bpf_p4tc_entry_create() we already had the max(5) allowed number of > of parameters. To work around this we had to merge two structs together > in order to maintain the number of params to 5 (Martin). > > 7) Add more info on commit log to explain the relation between the kfuncs > and TC for patch #14 (Martin). > > Changes in Version 12 > ---------------------- > > 0) Introduce back 15 patches (v11 had 5) > > 1) From discussions with Daniel: > i) Remove the XDP programs association alltogether. No refcounting. nothing. > ii) Remove prog type tc - everything is now an ebpf tc action. > > 2) s/PAD0/__pad0/g. Thanks to Marcelo. > > 3) Add extack to specify how many entries (N of M) specified in a batch for > any of requested Create/Update/Delete succeeded. Prior to this it would > only tell us the batch failed to complete without giving us details of > which of M failed. Added as a debug aid. > > Changes in Version 11 > ---------------------- > 1) Split the series into two. Original patches 1-5 in this patchset. The rest > will go out after this is merged. > > 2) Change any references of IFNAMSIZ in the action code when referencing the > action name size to ACTNAMSIZ. Thanks to Marcelo. > > Changes in Version 10 > ---------------------- > 1) A couple of patches from the earlier version were clean enough to submit, > so we did. This gave us room to split the two largest patches each into > two. Even though the split is not git-bisactable and really some of it didn't > make much sense (eg spliting a create, and update in one patch and delete and > get into another) we made sure each of the split patches compiled > independently. The idea is to reduce the number of lines of code to review > and when we get sufficient reviews we will put the splits together again. > See patch #12 and #13 as well as patches #7 and #8). > > 2) Add more context in patch 0. Please READ! > > 3) Added dump/delete filters back to the code - we had taken them out in the > earlier patches to reduce the amount of code for review - but in retrospect > we feel they are important enough to push earlier rather than later. > > > Changes In version 9 > --------------------- > > 1) Remove the largest patch (externs) to ease review. > > 2) Break up action patches into two to ease review bringing down the patches > that need more scrutiny to 8 (the first 7 are almost trivial). > > 3) Fixup prefix naming convention to p4tc_xxx for uapi and p4a_xxx for actions > to provide consistency(Jiri). > > 4) Silence sparse warning "was not declared. Should it be static?" for kfuncs > by making them static. TBH, not sure if this is the right solution > but it makes sparse happy and hopefully someone will comment. > > Changes In Version 8 > --------------------- > > 1) Fix all the patchwork warnings and improve our ci to catch them in the future > > 2) Reduce the number of patches to basic max(15) to ease review. > > Changes In Version 7 > ------------------------- > > 0) First time removing the RFC tag! > > 1) Removed XDP cookie. It turns out as was pointed out by Toke(Thanks!) - that > using bpf links was sufficient to protect us from someone replacing or deleting > a eBPF program after it has been bound to a netdev. > > 2) Add some reviewed-bys from Vlad. > > 3) Small bug fixes from v6 based on testing for ebpf. > > 4) Added the counter extern as a sample extern. Illustrating this example because > it is slightly complex since it is possible to invoke it directly from > the P4TC domain (in case of direct counters) or from eBPF (indirect counters). > It is not exactly the most efficient implementation (a reasonable counter impl > should be per-cpu). > > Changes In RFC Version 6 > ------------------------- > > 1) Completed integration from scriptable view to eBPF. Completed integration > of externs integration. > > 2) Small bug fixes from v5 based on testing. > > Changes In RFC Version 5 > ------------------------- > > 1) More integration from scriptable view to eBPF. Small bug fixes from last > integration. > > 2) More streamlining support of externs via kfunc (create-on-miss, etc) > > 3) eBPF linking for XDP. > > There is more eBPF integration/streamlining coming (we are getting close to > conversion from scriptable domain). > > Changes In RFC Version 4 > ------------------------- > > 1) More integration from scriptable to eBPF. Small bug fixes. > > 2) More streamlining support of externs via kfunc (one additional kfunc). > > 3) Removed per-cpu scratchpad per Toke's suggestion and instead use XDP metadata. > > There is more eBPF integration coming. One thing we looked at but is not in this > patchset but should be in the next is use of eBPF link in our loading (see > "challenge #1" further below). > > Changes In RFC Version 3 > ------------------------- > > These patches are still in a little bit of flux as we adjust to integrating > eBPF. So there are small constructs that are used in V1 and 2 but no longer > used in this version. We will make a V4 which will remove those. > The changes from V2 are as follows: > > 1) Feedback we got in V2 is to try stick to one of the two modes. In this version > we are taking one more step and going the path of mode2 vs v2 where we had 2 modes. > > 2) The P4 Register extern is no longer standalone. Instead, as part of integrating > into eBPF we introduce another kfunc which encapsulates Register as part of the > extern interface. > > 3) We have improved our CICD to include tools pointed to us by Simon. See > "Testing" further below. Thanks to Simon for that and other issues he caught. > Simon, we discussed on issue [7] but decided to keep that log since we think > it is useful. > > 4) A lot of small cleanups. Thanks Marcelo. There are two things we need to > re-discuss though; see: [5], [6]. > > 5) We removed the need for a range of IDs for dynamic actions. Thanks Jakub. > > 6) Clarify ambiguity caused by smatch in an if(A) else if(B) condition. We are > guaranteed that either A or B must exist; however, lets make smatch happy. > Thanks to Simon and Dan Carpenter. > > Changes In RFC Version 2 > ------------------------- > > Version 2 is the initial integration of the eBPF datapath. > We took into consideration suggestions provided to use eBPF and put effort into > analyzing eBPF as datapath which involved extensive testing. > We implemented 6 approaches with eBPF and ran performance analysis and presented > our results at the P4 2023 workshop in Santa Clara[see: 1, 3] on each of the 6 > vs the scriptable P4TC and concluded that 2 of the approaches are sensible (4 if > you account for XDP or TC separately). > > Conclusions from the exercise: We lose the simple operational model we had > prior to integrating eBPF. We do gain performance in most cases when the > datapath is less compute-bound. > For more discussion on our requirements vs journeying the eBPF path please > scroll down to "Restating Our Requirements" and "Challenges". > > This patch set presented two modes. > mode1: the parser is entirely based on eBPF - whereas the rest of the > SW datapath stays as _scriptable_ as in Version 1. > mode2: All of the kernel s/w datapath (including parser) is in eBPF. > > The key ingredient for eBPF, that we did not have access to in the past, is > kfunc (it made a big difference for us to reconsider eBPF). > > In V2 the two modes are mutually exclusive (IOW, you get to choose one > or the other via Kconfig). > > Jamal Hadi Salim (15): > net: sched: act_api: Introduce P4 actions list > net/sched: act_api: increase action kind string length > net/sched: act_api: Update tc_action_ops to account for P4 actions > net/sched: act_api: add struct p4tc_action_ops as a parameter to > lookup callback > net: sched: act_api: Add support for preallocated P4 action instances > p4tc: add P4 data types > p4tc: add template API > p4tc: add template pipeline create, get, update, delete > p4tc: add template action create, update, delete, get, flush and dump > p4tc: add runtime action support > p4tc: add template table create, update, delete, get, flush and dump > p4tc: add runtime table entry create and update > p4tc: add runtime table entry get, delete, flush and dump > p4tc: add set of P4TC table kfuncs > p4tc: add P4 classifier > > include/linux/bitops.h | 1 + > include/net/act_api.h | 23 +- > include/net/p4tc.h | 714 +++++++ > include/net/p4tc_types.h | 89 + > include/net/tc_act/p4tc.h | 79 + > include/uapi/linux/p4tc.h | 465 +++++ > include/uapi/linux/pkt_cls.h | 15 + > include/uapi/linux/rtnetlink.h | 18 + > include/uapi/linux/tc_act/tc_p4.h | 11 + > net/sched/Kconfig | 23 + > net/sched/Makefile | 3 + > net/sched/act_api.c | 192 +- > net/sched/cls_api.c | 2 +- > net/sched/cls_p4.c | 305 +++ > net/sched/p4tc/Makefile | 8 + > net/sched/p4tc/p4tc_action.c | 2419 +++++++++++++++++++++++ > net/sched/p4tc/p4tc_bpf.c | 360 ++++ > net/sched/p4tc/p4tc_filter.c | 1012 ++++++++++ > net/sched/p4tc/p4tc_pipeline.c | 700 +++++++ > net/sched/p4tc/p4tc_runtime_api.c | 145 ++ > net/sched/p4tc/p4tc_table.c | 1820 +++++++++++++++++ > net/sched/p4tc/p4tc_tbl_entry.c | 3071 +++++++++++++++++++++++++++++ > net/sched/p4tc/p4tc_tmpl_api.c | 440 +++++ > net/sched/p4tc/p4tc_types.c | 1213 ++++++++++++ > net/sched/p4tc/trace.c | 10 + > net/sched/p4tc/trace.h | 44 + > security/selinux/nlmsgtab.c | 10 +- > 27 files changed, 13156 insertions(+), 36 deletions(-) > create mode 100644 include/net/p4tc.h > create mode 100644 include/net/p4tc_types.h > create mode 100644 include/net/tc_act/p4tc.h > create mode 100644 include/uapi/linux/p4tc.h > create mode 100644 include/uapi/linux/tc_act/tc_p4.h > create mode 100644 net/sched/cls_p4.c > create mode 100644 net/sched/p4tc/Makefile > create mode 100644 net/sched/p4tc/p4tc_action.c > create mode 100644 net/sched/p4tc/p4tc_bpf.c > create mode 100644 net/sched/p4tc/p4tc_filter.c > create mode 100644 net/sched/p4tc/p4tc_pipeline.c > create mode 100644 net/sched/p4tc/p4tc_runtime_api.c > create mode 100644 net/sched/p4tc/p4tc_table.c > create mode 100644 net/sched/p4tc/p4tc_tbl_entry.c > create mode 100644 net/sched/p4tc/p4tc_tmpl_api.c > create mode 100644 net/sched/p4tc/p4tc_types.c > create mode 100644 net/sched/p4tc/trace.c > create mode 100644 net/sched/p4tc/trace.h > > -- > 2.34.1 >