Hi Chris, P4 was created to support programming the hardware data path in high end routers, but P4-TC would enable the use of P4 across all Linux devices. Since this is potentially a lot of code going into the kernel to support it, I believe it's entirely fair for us to evaluate and give feedback on the P4 language and its suitability for the broader user community including environments where there will never be a need for P4 hardware. Note that I am questioning the design decisions of P4 in the context of supporting a DSL in the kernel via P4-TC, if the P4->eBPF compiler is used then then these concerns are less pertinent. Nevertheless, I would suggest that the P4 folks take the points being raised as constructive feedback on the language. I took a cursory look at several P4 programs including tutorials, switch code, firewalls, etc. I have particular interest in variable length headers, so I'll use https://github.com/jafingerhut/p4-guide/blob/master/checksum/checksum-ipv4-with-options.p4 as a reference. The first thing I noticed about P4 is that almost everything is expressed as a bit field. Like bit<8> and bit<32>. I suppose this arises from the fact that P4 was originally intended to run in non-CPU hardware where there's no inherent unit of data like bytes. But, CPUs don't work that way; CPUs work ordinal types of bytes, half words, words, double words, etc. (__u8, __u16, __u32, __u64). That means that all mainstream computer languages fundamentally operate on ordinal types even if the variable types are explicitly declared. If someone programming in P4 needs to map original types to bit fields in P4, so if they want a __u32 they need to use a bit<32> in P4 (except they're not exactly equivalent, a __u32 in C is guaranteed to be byte aligned and I'm assuming in P4 bit<32> is not guaranteed to be byte aligned-- this seems like it might be susceptible to programming errors). I'd also point out that networking protocols are also defined using ordinal type fields, there are some exceptions, but for the most part protocol fields try to be in units of bytes (or octets if you want to be old school!). I believe life would be easier for the programmer if they could just define variables and fields with ordinal types, the fix here seems simple enough just add typedefs to P4 like "typedef __u32 bit<32>". In the IP header definition there's "varbit<320> options;". It took me several seconds to decode this and realize this is space for forty bytes of IP options (i.e. 8 * 40 == 320). I suppose this follows the design of using bit fields for everything, but I think this is more than just an annoyance like the bit fields for ordinal types are. First off, it's not very readable. I've never heard anyone say that there's 320 bits of IP options, or seen an RFC specify that. Likewise, the standard Ethernet MTU is 1500 bytes, not 12,000 bits which would seem to be how that would be expressed in P4. So this seems very unreadable to me and potentially prone to errors. The fix for this also seems easy, why not just add varbyte to P4 so we can do varbyte<40>, varbyte<87>, varbyte<123>, etc.? The next thing I notice about the P4 programs I surveyed is that all of them seem to define the protocol headers within the protocol. Every program seems to have "header ethernet_t" and "header ipv4_t" and other protocols that are used and protocol constants like Ethertypes also seem to be spelled out in each program. Sometimes these are in include files within the program. What I don't see is that P4 has a standard set of include files for defining protocol headers. For instance, in Linux C we would just do "#include <linux/if_ether.h>" and "#include <linux/ip.h>" to get the definitions of the Ethernet header and IPv4 header. In fact, if someone were to submit a patch to Netdev that included its own definition of Ethernet or an IP header structure they would almost certainly get pushback. It's a fundamental programming principle, not just in networking but pretty much everywhere, to not continuously redefine common and standard constructs-- just put common things in header files that can be shared by multiple programs (to do otherwise substantially increases the possibility of errors, bloats code, and reduces readability). Marshalling up common definitions into header files that are common in the P4 development environment seems simple enough (maybe it's already done?), but I would also point out that Linux has included files that describe protocol formats and header structures for almost every protocol under the sun that are well tested. It would be great if somehow we could somehow leverage that work. For instance, in the P4 samples I looked at srcAddr and dstAddr are defined for IP addresses, but in linux/ip.h their saddr and daddr are the respective field names. Why not just base the P4 definition on the Linux one? Then when someone is porting code from Linux to P4 they can use the same field names-- this makes things a lot easier on the programmer! I'll also mention that we wrote a little Python script to generate P4 header and constant definitions from Linux headers. It almost worked, the snag we hit was that P4 has some limits on nesting structures and unions so we couldn't translate some of the C structures to P4 (if you're interested I can provide the details on the problem we hit). The IPv4 header checksum code was a real head scratcher for me. Do we really need to state each field in the IP header just to compute the checksum? (and not just do this once, but twice :-( ). See code below for verifyChecksum and updateChecksum. In C, verifying and setting the IP header checksum is really easy: if (checksum(iphdr, 0, iphdr->ihl << 4)) goto bad_csum; ip->csum = checksum(iphdr, 0, iphdr->ihl << 4); Relative to the C code, the P4 code seems very convoluted to me and prone to errors. What if someone accidentally omits a field? What if fields become slightly out of order? Also, no one would ever describe the IPv4 checksum as taking the checksum over the IHL, diffserv, totalLen, ... That is *way* too complicated for an algorithm that is really simple-- from RFC791: "The checksum field is the 16 bit one's complement of the one's complement sum of all 16 bit words in the header.". Reverse engineering the design, the clue seems to be HashAlgorithm.csum16. Maybe in P4 the IP checksum is just considered another form of hash, and I suspect the input to hash computation is specified as sort of data structure to make things generic (for instance, how we create a substructure in flow keys in flow_dissector to compute a SipHash over the TCP and UDP tuple). But, the IPv4 checksum isn't just another hash-- on a host, we need to compute the checksum for *every* IPv4 packet. This has to be fast and simple, we can do this in as few as five instructions or less. So even if the code below is correct, I have to wonder how easy it is to emit an efficient executable. Would a compiler easily realize that all the fields in the pseudo structure are contiguous without holes such that it can omit those five instructions? I don't know how prevalent this method of listing all the fields in a data structure as arguments to a function is in P4, but, by almost any objective measure, I have to say that the code below is bad and bloated. Maybe there's a better way to do it in P4, but if there's not then this is a deficiency in the P4 language. Tom control verifyChecksum(inout headers hdr, inout metadata meta) { apply { // There is code similar to this in Github repo p4lang/p4c in // file testdata/p4_16_samples/flowlet_switching-bmv2.p4 // However in that file it is only for a fixed length IPv4 // header with no options. verify_checksum(true, { hdr.ipv4.version, hdr.ipv4.ihl, hdr.ipv4.diffserv, hdr.ipv4.totalLen, hdr.ipv4.identification, hdr.ipv4.flags, hdr.ipv4.fragOffset, hdr.ipv4.ttl, hdr.ipv4.protocol, hdr.ipv4.srcAddr, hdr.ipv4.dstAddr #ifdef ALLOW_IPV4_OPTIONS , hdr.ipv4.options #endif /* ALLOW_IPV4_OPTIONS */ }, hdr.ipv4.hdrChecksum, HashAlgorithm.csum16); } } control updateChecksum(inout headers hdr, inout metadata meta) { apply { update_checksum(true, { hdr.ipv4.version, hdr.ipv4.ihl, hdr.ipv4.diffserv, hdr.ipv4.totalLen, hdr.ipv4.identification, hdr.ipv4.flags, hdr.ipv4.fragOffset, hdr.ipv4.ttl, hdr.ipv4.protocol, hdr.ipv4.srcAddr, hdr.ipv4.dstAddr #ifdef ALLOW_IPV4_OPTIONS , hdr.ipv4.options #endif /* ALLOW_IPV4_OPTIONS */ }, hdr.ipv4.hdrChecksum, HashAlgorithm.csum16); } } On Wed, May 22, 2024 at 8:34 PM Tom Herbert <tom@xxxxxxxxxx> wrote: > > On Wed, May 22, 2024 at 7:30 PM Chris Sommers > <chris.sommers@xxxxxxxxxxxx> wrote: > > > > > On Wed, May 22, 2024 at 8:54 PM Tom Herbert <mailto:tom@xxxxxxxxxx> wrote: > > > > > > > > On Wed, May 22, 2024 at 5:09 PM Chris Sommers > > > > <mailto:chris.sommers@xxxxxxxxxxxx> wrote: > > > > > > > > > > > On Wed, May 22, 2024 at 6:19 PM Jakub Kicinski <mailto:kuba@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > Hi Jamal! > > > > > > > > > > > > > > On Tue, 21 May 2024 08:35:07 -0400 Jamal Hadi Salim wrote: > > > > > > > > At that point(v16) i asked for the series to be applied despite the > > > > > > > > Nacks because, frankly, the Nacks have no merit. Paolo was not > > > > > > > > comfortable applying patches with Nacks and tried to mediate. In his > > > > > > > > mediation effort he asked if we could remove eBPF - and our answer was > > > > > > > > no because after all that time we have become dependent on it and > > > > > > > > frankly there was no technical reason not to use eBPF. > > > > > > > > > > > > > > I'm not fully clear on who you're appealing to, and I may be missing > > > > > > > some points. But maybe it will be more useful than hurtful if I clarify > > > > > > > my point of view. > > > > > > > > > > > > > > AFAIU BPF folks disagree with the use of their subsystem, and they > > > > > > > point out that P4 pipelines can be implemented using BPF in the first > > > > > > > place. > > > > > > > To which you reply that you like (a highly dated type of) a netlink > > > > > > > interface, and (handwavey) ability to configure the data path SW or > > > > > > > HW via the same interface. > > > > > > > > > > > > It's not what I "like" , rather it is a requirement to support both > > > > > > s/w and h/w offload. The TC model is the traditional approach to > > > > > > deploy these models. I addressed the same comment you are making above > > > > > > in #1a and #1b (https://urldefense.com/v3/__https://github.com/p4tc-dev/pushback-patches__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs0-w2zKP2A$). > > > >> > > > > > > > OTOH, "BPF folks disagree with the use of their subsystem" is a > > > > > > problematic statement. Is BPF infra for the kernel community or is it > > > > > > something the ebpf folks can decide, at their whim, to allow who they > > > > > > like to use or not. We are not changing any BPF code. And there's > > > > > > already a case where the interfaces are used exactly as we used them > > > > > > in the conntrack code i pointed to in the page (we literally copied > > > > > > that code). Why is it ok for conntrack code to use exactly the same > > > > > > approach but not us? > > > > > > > > > > > > > AFAICT there's some but not very strong support for P4TC, > > > > > > > > > > > > I dont agree. Paolo asked this question and afaik Intel, AMD (both > > > > > > build P4-native NICs) and the folks interested in the MS DASH project > > > > > > responded saying they are in support. Look at who is being Cced. A lot > > > > > > of these folks who attend biweekly discussion calls on P4TC. Sample: > > > > > > https://urldefense.com/v3/__https://lore.kernel.org/netdev/IA0PR17MB7070B51A955FB8595FFBA5FB965E2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs09TFzoQBw$ > > > >> > > > > > > +1 > > > > > > > and it > > > > > > > doesn't benefit or solve any problems of the broader networking stack > > > > > > > (e.g. expressing or configuring parser graphs in general) > > > > > > > > > > > > > > > > > > > > > > > Huh? As a DSL, P4 has already been proven to be an extremely effective and popular way to express parse graphs, stack manipulation, and stateful programming. Yesterday, I used the P4TC dev branch to implement something in one sitting, which includes parsing RoCEv2 network stacks. I just cut and pasted P4 code originally written for a P4 ASIC into a working P4TC example to add functionality. It took mere seconds to compile and launch it, and a few minutes to test it. I know of no other workflow which provides such quick turnaround and is so accessible. I'd like it to be as ubiquitous as eBPF itself. > > > > > > > > Chris, > > > > > > > > When you say "it took mere seconds to compile and launch" are you > > > > taking into account the ramp up time that it takes to learn P4 and > > > > become proficient to do something interesting? > > > > Hi Tom, thanks for the dialog. To answer your question, it took seconds to compile and deploy, not learn P4. Adding the parsing for several headers took minutes. If you want to compare learning curve, learning to write P4 code and let the framework handle all the painful low-level Linux details is way easier than trying to learn how to write c code for Linux networking. It’s not even close. I’ve written C for 40 years, P4 for 7 years, and dabbled in eBPF so I can attest to the ease of learning and using P4. I’ve onboarded and mentored engineers who barely knew C, to develop complex networking products using P4, and built the automation APIs (REST, gRPC) to manage them. One person can develop an entire commercial product by themselves in months. P4 has expanded the reach of programmers such that both HW and SW engineers can easily learn P4 and become pretty adept at it. I would not expect even experienced c programmers to be able to master Linux internals very quickly. Writing a P4-TC program and injecting it via tc was like magic the first time. > > > > >> Considering that P4 > > > > syntax is very different from typical languages than networking > > > > programmers are typically familiar with, this ramp up time is > > > > non-zero. OTOH, eBPF is ubiquitous because it's primarily programmed > > > > in Restricted C-- this makes it easy for many programmers since they > > > > don't have to learn a completely new language and so the ramp up time > > > > for the average networking programmer is much less for using eBPF. > > > > I think your statement about “typical network programmers” overlooks the fact that since P4 was introduced, it has been taught in many universities to teach networking and possibly enabled a whole new breed of “network engineers” who can solve real problems without even knowing C programming. Without P4 they might never have gone this route. A class in network stack programming using c would have so many prerequisites to even get to parsing, compared to P4, where it could be demonstrated in one lesson. These “networking programmers” are not typical by your standards, but there are many such. They have just as much claim to the title "network programmer” as a C programmer. Similarly, an assembly language programmer is no less than a C or Python programmer. People writing P4 are usually focused on applications, and it is very useful and productive for that. Why should someone have to learn low-level C or eBPF to solve their problem? > > Hio Chris, > > You're comparing learning a completely new language versus programming > in a subset of an established language, they're really not comparable. > When one programs in Restricted-C they just need to understand what > features of C are supported. > > > > > > > > > > > This is really the fundamental problem with DSLs, they require > > > > specialized skill sets in a programming language for a narrow use case > > > > (and specialized compilers, tool chains, debugging, etc)-- this means > > > > a DSL only makes sense if there is no other means to accomplish the > > > > same effects using a commodity language with perhaps a specialized > > > > library (it's not just in the networking realm, consider the > > > > advantages of using CUDA-C instead of a DLS for GPUs). > > > > A pretty strong opinion, but DSLs arise to fill a need and P4 did so. It's still going strong. > > > > >> Personally, I > > > > don't believe that P4 has yet to be proven necessary for programming a > > > > datapath-- for instance we can program a parser in declarative > > > > representation in C, > > > > https://urldefense.com/v3/__https://netdevconf.info/0x16/papers/11/High*20Performance*20Programmable*20Parsers.pdf__;JSUl!!I5pVk4LIGAfnvw!m9zrSDvddfzSt_sMBjOEvqw31RzAwWlEDM4ah5IJ2kqsmq6XtPIVJd-1_ZoGWBXKLyda77RYLvGR83Ginw$. > > > > CPL (slide11) looks like a DSL wrapped in JSON to me. “Solution: Common Parser Language (CPL); Parser representation in declarative .json” So I am confused. It is either a new language a.k.a. DSL, or it's not. Nothing against it, I'm sure it is great, but let's call it what it is. > > Correct, it's not a new language. We've since renamed it Common Parser > Representation. > > > We already have parser representations in declarative p4. And it's used and known worldwide. And has a respectable specification, any users and working groups. And it's formally provable (https://github.com/verified-network-toolchain/petr4) > > > > > > > > > > So unless P4 is proven necessary, then I'm doubtful it will ever be a > > > > ubiquitous way to program the kernel-- it seems much more likely that > > > > people will continue to use C and eBPF, and for those users that want > > > > to use P4 they can use P4->eBPF compiler. > > > > “ubiquitous way to program the kernel” – is not my goal. I don’t even want to know about the kernel when I am writing p4 - it's just a means to an end. I want to manipulate packets on a Linux host. P4DPDK, P4-eBPF, P4-TC – all let me do that. I LOVE the fact that P4-TC would be available in every Linux distro once upstreamed. It would solve so many deployment issues, benefit from regression testing, etc. So much goodness > > > > " and for those users that want to use P4 they can use P4->eBPF compiler." -I'd really like to choose for myself and not have someone make that choice for me. P4-TC checks all the boxes for me. > > Sure, but this is a lot of kernel code and that will require support > and maintenance. It needs to be justified, and the fact that someone > wants it just to have a choice is, frankly, not much of a > justification. I think a justification needs to start with "Why isn't > P4->eBPF sufficient?" (the question has been raised several times, but > it still doesn't seem like there's a strong answer). > > Tom > > > > Thanks for the point of view, it's healthy to debate. > > Cheers, > > Chris > > > > > > > > > > > > Tom, > > > I cant stop the distraction of this thread becoming a discussion on > > > the merits of DSL vs a lower level language (and I know you are not a > > > P4 fan) but please change the subject so we dont loose the main focus > > > which is a discussion on the patches. I have done it for you. Chris if > > > you wish to respond please respond under the new thread subject. > > > > > > cheers, > > > jamal > >