Re: DSL vs low level language WAS(Re: On the NACKs on P4TC patches

Tom Herbert <tom@xxxxxxxxxx> · Fri, 24 May 2024 09:50:30 -0700

Hi Chris,

P4 was created to support programming the hardware data path in high
end routers, but P4-TC would enable the use of P4 across all Linux
devices. Since this is potentially a lot of code going into the kernel
to support it, I believe it's entirely fair for us to evaluate and
give feedback on the P4 language and its suitability for the broader
user community including environments where there will never be a need
for P4 hardware. Note that I am questioning the design decisions of P4
in the context of supporting a DSL in the kernel via P4-TC, if the
P4->eBPF compiler is used then then these concerns are less pertinent.
Nevertheless, I would suggest that the P4 folks take the points being
raised as constructive feedback on the language.

I took a cursory look at several P4 programs including tutorials,
switch code, firewalls, etc. I have particular interest in variable
length headers, so I'll use
https://github.com/jafingerhut/p4-guide/blob/master/checksum/checksum-ipv4-with-options.p4
as a reference.

The first thing I noticed about P4 is that almost everything is
expressed as a bit field. Like bit<8> and bit<32>. I suppose this
arises from the fact that P4 was originally intended to run in non-CPU
hardware where there's no inherent unit of data like bytes. But, CPUs
don't work that way; CPUs work ordinal types of bytes, half words,
words, double words, etc. (__u8, __u16, __u32, __u64). That means that
all mainstream computer languages fundamentally operate on ordinal
types even if the variable types are explicitly declared. If someone
programming in P4 needs to map original types to bit fields in P4, so
if they want a __u32 they need to use a bit<32> in P4 (except they're
not exactly equivalent, a __u32 in C is guaranteed to be byte aligned
and I'm assuming in P4 bit<32> is not guaranteed to be byte aligned--
this seems like it might be susceptible to programming errors). I'd
also point out that networking protocols are also defined using
ordinal type fields, there are some exceptions, but for the most part
protocol fields try to be in units of bytes (or octets if you want to
be old school!). I believe life would be easier for the programmer if
they could just define variables and fields with ordinal types, the
fix here seems simple enough just add typedefs to P4 like "typedef
__u32 bit<32>".

In the IP header definition there's "varbit<320>  options;". It took
me several seconds to decode this and realize this is space for forty
bytes of IP options (i.e. 8 * 40 == 320). I suppose this follows the
design of using bit fields for everything, but I think this is more
than just an annoyance like the bit fields for ordinal types are.
First off, it's not very readable. I've never heard anyone say that
there's 320 bits of IP options, or seen an RFC specify that. Likewise,
the standard Ethernet MTU is 1500 bytes, not 12,000 bits which would
seem to be how that would be expressed in P4. So this seems very
unreadable to me and potentially prone to errors. The fix for this
also seems easy, why not just add varbyte to P4 so we can do
varbyte<40>, varbyte<87>, varbyte<123>, etc.?

The next thing I notice about the P4 programs I surveyed is that all
of them seem to define the protocol headers within the protocol. Every
program seems to have "header ethernet_t" and "header ipv4_t" and
other protocols that are used and protocol constants like Ethertypes
also seem to be spelled out in each program. Sometimes these are in
include files within the program. What I don't see is that P4 has a
standard set of include files for defining protocol headers. For
instance, in Linux C we would just do "#include <linux/if_ether.h>"
and "#include <linux/ip.h>" to get the definitions of the Ethernet
header and IPv4 header. In fact, if someone were to submit a patch to
Netdev that included its own definition of Ethernet or an IP header
structure they would almost certainly get pushback. It's a fundamental
programming principle, not just in networking but pretty much
everywhere, to not continuously redefine common and standard
constructs-- just put common things in header files that can be shared
by multiple programs (to do otherwise substantially increases the
possibility of errors, bloats code, and reduces readability).

Marshalling up common definitions into header files that are common in
the P4 development environment seems simple enough (maybe it's already
done?), but I would also point out that Linux has included files that
describe protocol formats and header structures for almost every
protocol under the sun that are well tested. It would be great if
somehow we could somehow leverage that work. For instance, in the P4
samples I looked at srcAddr and dstAddr are defined for IP addresses,
but in linux/ip.h their saddr and daddr are the respective field
names. Why not just base the P4 definition on the Linux one? Then when
someone is porting code from Linux to P4 they can use the same field
names-- this makes things a lot easier on the programmer! I'll also
mention that we wrote a little Python script to generate P4 header and
constant definitions from Linux headers. It almost worked, the snag we
hit was that P4 has some limits on nesting structures and unions so we
couldn't translate some of the C structures to P4 (if you're
interested I can provide the details on the problem we hit).

The IPv4 header checksum code was a real head scratcher for me. Do we
really need to state each field in the IP header just to compute the
checksum? (and not just do this once, but twice :-( ). See code below
for verifyChecksum and updateChecksum.

In C, verifying and setting the IP header checksum is really easy:

if (checksum(iphdr, 0, iphdr->ihl << 4))
    goto bad_csum;

ip->csum = checksum(iphdr, 0, iphdr->ihl << 4);

Relative to the C code, the P4 code seems very convoluted to me and
prone to errors. What if someone accidentally omits a field? What if
fields become slightly out of order? Also, no one would ever describe
the IPv4 checksum as taking the checksum over the IHL, diffserv,
totalLen, ... That is *way* too complicated for an algorithm that is
really simple-- from RFC791: "The checksum field is the 16 bit one's
complement of the one's complement sum of all 16 bit words in the
header.". Reverse engineering the design, the clue seems to be
HashAlgorithm.csum16. Maybe in P4 the IP checksum is just considered
another form of hash, and I suspect the input to hash computation is
specified as sort of data structure to make things generic (for
instance, how we create a substructure in flow keys in flow_dissector
to compute a SipHash over the TCP and UDP tuple). But, the IPv4
checksum isn't just another hash-- on a host, we need to compute the
checksum for *every* IPv4 packet. This has to be fast and simple, we
can do this in as few as five instructions or less. So even if the
code below is correct, I have to wonder how easy it is to emit an
efficient executable. Would a compiler easily realize that all the
fields in the pseudo structure are contiguous without holes such that
it can omit those five instructions?

I don't know how prevalent this method of listing all the fields in a
data structure as arguments to a function is in P4, but, by almost any
objective measure, I have to say that the code below is bad and
bloated. Maybe there's a better way to do it in P4, but if there's not
then this is a deficiency in the P4 language.

Tom

control verifyChecksum(inout headers hdr,
                       inout metadata meta)
{
    apply {
        // There is code similar to this in Github repo p4lang/p4c in
        // file testdata/p4_16_samples/flowlet_switching-bmv2.p4
        // However in that file it is only for a fixed length IPv4
        // header with no options.
        verify_checksum(true,
            { hdr.ipv4.version,
                hdr.ipv4.ihl,
                hdr.ipv4.diffserv,
                hdr.ipv4.totalLen,
                hdr.ipv4.identification,
                hdr.ipv4.flags,
                hdr.ipv4.fragOffset,
                hdr.ipv4.ttl,
                hdr.ipv4.protocol,
                hdr.ipv4.srcAddr,
                hdr.ipv4.dstAddr
#ifdef ALLOW_IPV4_OPTIONS
                , hdr.ipv4.options
#endif /* ALLOW_IPV4_OPTIONS */
            },
            hdr.ipv4.hdrChecksum, HashAlgorithm.csum16);
    }
}

control updateChecksum(inout headers hdr,
                       inout metadata meta)
{
    apply {
        update_checksum(true,
            { hdr.ipv4.version,
                hdr.ipv4.ihl,
                hdr.ipv4.diffserv,
                hdr.ipv4.totalLen,
                hdr.ipv4.identification,
                hdr.ipv4.flags,
                hdr.ipv4.fragOffset,
                hdr.ipv4.ttl,
                hdr.ipv4.protocol,
                hdr.ipv4.srcAddr,
                hdr.ipv4.dstAddr
#ifdef ALLOW_IPV4_OPTIONS
                , hdr.ipv4.options
#endif /* ALLOW_IPV4_OPTIONS */
            },
            hdr.ipv4.hdrChecksum, HashAlgorithm.csum16);
    }
}

On Wed, May 22, 2024 at 8:34 PM Tom Herbert <tom@xxxxxxxxxx> wrote:
>
> On Wed, May 22, 2024 at 7:30 PM Chris Sommers
> <chris.sommers@xxxxxxxxxxxx> wrote:
> >
> > > On Wed, May 22, 2024 at 8:54 PM Tom Herbert <mailto:tom@xxxxxxxxxx> wrote:
> > > >
> > > > On Wed, May 22, 2024 at 5:09 PM Chris Sommers
> > > > <mailto:chris.sommers@xxxxxxxxxxxx> wrote:
> > > > >
> > > > > > On Wed, May 22, 2024 at 6:19 PM Jakub Kicinski <mailto:kuba@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > Hi Jamal!
> > > > > > >
> > > > > > > On Tue, 21 May 2024 08:35:07 -0400 Jamal Hadi Salim wrote:
> > > > > > > > At that point(v16) i asked for the series to be applied despite the
> > > > > > > > Nacks because, frankly, the Nacks have no merit. Paolo was not
> > > > > > > > comfortable applying patches with Nacks and tried to mediate. In his
> > > > > > > > mediation effort he asked if we could remove eBPF - and our answer was
> > > > > > > > no because after all that time we have become dependent on it and
> > > > > > > > frankly there was no technical reason not to use eBPF.
> > > > > > >
> > > > > > > I'm not fully clear on who you're appealing to, and I may be missing
> > > > > > > some points. But maybe it will be more useful than hurtful if I clarify
> > > > > > > my point of view.
> > > > > > >
> > > > > > > AFAIU BPF folks disagree with the use of their subsystem, and they
> > > > > > > point out that P4 pipelines can be implemented using BPF in the first
> > > > > > > place.
> > > > > > > To which you reply that you like (a highly dated type of) a netlink
> > > > > > > interface, and (handwavey) ability to configure the data path SW or
> > > > > > > HW via the same interface.
> > > > > >
> > > > > > It's not what I "like" , rather it is a requirement to support both
> > > > > > s/w and h/w offload. The TC model is the traditional approach to
> > > > > > deploy these models. I addressed the same comment you are making above
> > > > > > in #1a and #1b  (https://urldefense.com/v3/__https://github.com/p4tc-dev/pushback-patches__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs0-w2zKP2A$).
> > > >> >
> > > > > > OTOH, "BPF folks disagree with the use of their subsystem" is a
> > > > > > problematic statement. Is BPF infra for the kernel community or is it
> > > > > > something the ebpf folks can decide, at their whim, to allow who they
> > > > > > like to use or not. We are not changing any BPF code. And there's
> > > > > > already a case where the interfaces are used exactly as we used them
> > > > > > in the conntrack code i pointed to in the page (we literally copied
> > > > > > that code). Why is it ok for conntrack code to use exactly the same
> > > > > > approach but not us?
> > > > > >
> > > > > > > AFAICT there's some but not very strong support for P4TC,
> > > > > >
> > > > > > I dont agree. Paolo asked this question and afaik Intel, AMD (both
> > > > > > build P4-native NICs) and the folks interested in the MS DASH project
> > > > > > responded saying they are in support. Look at who is being Cced. A lot
> > > > > > of these folks who attend biweekly discussion calls on P4TC. Sample:
> > > > > > https://urldefense.com/v3/__https://lore.kernel.org/netdev/IA0PR17MB7070B51A955FB8595FFBA5FB965E2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs09TFzoQBw$
> > > >> >
> > > > > +1
> > > > > > > and it
> > > > > > > doesn't benefit or solve any problems of the broader networking stack
> > > > > > > (e.g. expressing or configuring parser graphs in general)
> > > > > > >
> > > > > >
> > > > >
> > > > > Huh? As a DSL, P4 has already been proven to be an extremely effective and popular way to express parse graphs, stack manipulation, and stateful programming. Yesterday, I used the P4TC dev branch to implement something in one sitting, which includes parsing RoCEv2 network stacks. I just cut and pasted P4 code originally written for a P4 ASIC into a working P4TC example to add functionality. It took mere seconds to compile and launch it, and a few minutes to test it. I know of no other workflow which provides such quick turnaround and is so accessible. I'd like it to be as ubiquitous as eBPF itself.
> > > >
> > > > Chris,
> > > >
> > > > When you say "it took mere seconds to compile and launch" are you
> > > > taking into account the ramp up time that it takes to learn P4 and
> > > > become proficient to do something interesting?
> >
> > Hi Tom, thanks for the dialog. To answer your question, it took seconds to compile and deploy, not learn P4. Adding the parsing for several headers took minutes. If you want to compare learning curve, learning to write P4 code and let the framework handle all the painful low-level Linux details is way easier than trying to learn how to write c code for Linux networking. It’s not even close. I’ve written C for 40 years, P4 for 7 years, and dabbled in eBPF so I can attest to the ease of learning and using P4. I’ve onboarded and mentored engineers who barely knew C, to develop complex networking products using P4, and built the automation APIs (REST, gRPC) to manage them. One person can develop an entire commercial product by themselves in months. P4 has expanded the reach of programmers such that both HW and SW engineers can easily learn P4 and become pretty adept at it. I would not expect even experienced c programmers to be able to master Linux internals very quickly. Writing a P4-TC program and injecting it via tc was like magic the first time.
> >
> > >> Considering that P4
> > > > syntax is very different from typical languages than networking
> > > > programmers are typically familiar with, this ramp up time is
> > > > non-zero. OTOH, eBPF is ubiquitous because it's primarily programmed
> > > > in Restricted C-- this makes it easy for many programmers since they
> > > > don't have to learn a completely new language and so the ramp up time
> > > > for the average networking programmer is much less for using eBPF.
> >
> > I think your statement about “typical network programmers” overlooks the fact that since P4 was introduced, it has been taught in many universities to teach networking and possibly enabled a whole new breed of “network engineers” who can solve real problems without even knowing C programming. Without P4 they might never have gone this route. A class in network stack programming using c would have so many prerequisites to even get to parsing, compared to P4, where it could be demonstrated in one lesson. These “networking programmers” are not typical by your standards, but there are many such. They have just as much claim to the title "network programmer” as a C programmer. Similarly, an assembly language programmer is no less than a C or Python programmer. People writing P4 are usually focused on applications, and it is very useful and productive for that. Why should someone have to learn low-level C or eBPF to solve their problem?
>
> Hio Chris,
>
> You're comparing learning a completely new language versus programming
> in a subset of an established language, they're really not comparable.
> When one programs in Restricted-C they just need to understand what
> features of C are supported.
>
> >
> > > >
> > > > This is really the fundamental problem with DSLs, they require
> > > > specialized skill sets in a programming language for a narrow use case
> > > > (and specialized compilers, tool chains, debugging, etc)-- this means
> > > > a DSL only makes sense if there is no other means to accomplish the
> > > > same effects using a commodity language with perhaps a specialized
> > > > library (it's not just in the networking realm, consider the
> > > > advantages of using CUDA-C instead of a DLS for GPUs).
> >
> > A pretty strong opinion, but DSLs arise to fill a need and P4 did so. It's still going strong.
> >
> > >> Personally, I
> > > > don't believe that P4 has yet to be proven necessary for programming a
> > > > datapath-- for instance we can program a parser in declarative
> > > > representation in C,
> > > > https://urldefense.com/v3/__https://netdevconf.info/0x16/papers/11/High*20Performance*20Programmable*20Parsers.pdf__;JSUl!!I5pVk4LIGAfnvw!m9zrSDvddfzSt_sMBjOEvqw31RzAwWlEDM4ah5IJ2kqsmq6XtPIVJd-1_ZoGWBXKLyda77RYLvGR83Ginw$.
> >
> > CPL (slide11) looks like a DSL wrapped in JSON to me. “Solution: Common Parser Language (CPL); Parser representation in declarative .json” So I am confused. It is either a new language a.k.a. DSL, or it's not. Nothing against it, I'm sure it is great, but let's call it what it is.
>
> Correct, it's not a new language. We've since renamed it Common Parser
> Representation.
>
> > We already have parser representations in declarative p4. And it's used and known worldwide. And has a respectable specification, any users and working groups. And it's formally provable (https://github.com/verified-network-toolchain/petr4)
> >
> > > >
> > > > So unless P4 is proven necessary, then I'm doubtful it will ever be a
> > > > ubiquitous way to program the kernel-- it seems much more likely that
> > > > people will continue to use C and eBPF, and for those users that want
> > > > to use P4 they can use P4->eBPF compiler.
> >
> > “ubiquitous way to program the kernel” – is not my goal. I don’t even want to know about the kernel when I am writing p4 - it's just a means to an end. I want to manipulate packets on a Linux host. P4DPDK, P4-eBPF, P4-TC – all let me do that. I LOVE the fact that P4-TC would be available in every Linux distro once upstreamed. It would solve so many deployment issues, benefit from regression testing, etc. So much goodness
> >
> > " and for those users that want to use P4 they can use P4->eBPF compiler." -I'd really like to choose for myself and not have someone make that choice for me. P4-TC checks all the boxes for me.
>
> Sure, but this is a lot of kernel code and that will require support
> and maintenance. It needs to be justified, and the fact that someone
> wants it just to have a choice is, frankly, not much of a
> justification. I think a justification needs to start with "Why isn't
> P4->eBPF sufficient?" (the question has been raised several times, but
> it still doesn't seem like there's a strong answer).
>
> Tom
> >
> > Thanks for the point of view, it's healthy to debate.
> > Cheers,
> > Chris
> >
> > > >
> > >
> > > Tom,
> > > I cant stop the distraction of this thread becoming a discussion on
> > > the merits of DSL vs a lower level language (and I know you are not a
> > > P4 fan) but please change the subject so we dont loose the main focus
> > > which is a discussion on the patches. I have done it for you. Chris if
> > > you wish to respond please respond under the new thread subject.
> > >
> > > cheers,
> > > jamal
> >