RE: [Bpf] Standardizing BPF assembly language?

dthaler1968@xxxxxxxxxxxxxx · Tue, 23 Jan 2024 15:15:09 -0800

> -----Original Message-----
> From: David Vernet <void@xxxxxxxxxxxxx>
> Sent: Tuesday, January 23, 2024 1:52 PM
> To: dthaler1968@xxxxxxxxxxxxxx
> Cc: bpf@xxxxxxxx; bpf@xxxxxxxxxxxxxxx; jose.marchesi@xxxxxxxxxx
> Subject: Re: [Bpf] Standardizing BPF assembly language?
> 
> On Tue, Jan 23, 2024 at 01:41:10PM -0800, dthaler1968@xxxxxxxxxxxxxx
> wrote:
> > > -----Original Message-----
> > > From: David Vernet <void@xxxxxxxxxxxxx>
> > > Sent: Tuesday, January 23, 2024 1:31 PM
> > > To: dthaler1968@xxxxxxxxxxxxxx
> > > Cc: bpf@xxxxxxxx; bpf@xxxxxxxxxxxxxxx; jose.marchesi@xxxxxxxxxx
> > > Subject: Re: [Bpf] Standardizing BPF assembly language?
> > >
> > > On Tue, Jan 23, 2024 at 08:45:32AM -0800,
> > > dthaler1968=40googlemail.com@xxxxxxxxxxxxxx wrote:
> > > > At LSF/MM/BPF 2023, Jose gave a presentation about BPF assembly
> > > > language
(http://vger.kernel.org/bpfconf2023_material/compiled_bpf.txt).
> > > >
> > > > Jose wrote in that link:
> > > > > There are two dialects of BPF assembler in use today:
> > > > >
> > > > > - A "pseudo-c" dialect (originally "BPF verifier format")
> > > > >  : r1 = *(u64 *)(r2 + 0x00f0)
> > > > >  : if r1 > 2 goto label
> > > > >  : lock *(u32 *)(r2 + 10) += r3
> > > > >
> > > > > - An "assembler-like" dialect
> > > > >  : ldxdw %r1, [%r2 + 0x00f0]
> > > > >  : jgt %r1, 2, label
> > > > >  : xaddw [%r2 + 2], r3
> > > >
> > > > During Jose's talk, I discovered that uBPF didn't quote match the
> > > > second dialect and submitted a bug report.  By the time the
> > > > conference was over, uBPF had been updated to match GCC, so that
> > > > discussion worked to reduce the number of variants.
> > > >
> > > > As more instructions get added and supported by more tools and
> > > > compilers there's the risk of even more variants unless it's
> > standardized.
> > > >
> > > > Hence I'd recommend that BPF assembly language get documented in
> > > > some WG draft.  If folks agree with that premise, the first
> > > > question is
> > > > then: which document?
> > >
> > > > One possible answer would be the ISA document that specifies the
> > > > instructions, since that would the IANA registry could list the
> > > > assembly for each instruction, and any future documents that add
> > > > instructions would necessarily need to specify the assembly for
> > > > them, preventing variants from springing up for new instructions.
> > >
> > > I'm not opposed to this, but would strongly prefer that we do it as
> > > an
> > extension
> > > if we go this route to avoid scope creep for the first iteration.
> >
> > If the first iteration does not have it, then presumably the initial
> > IANA registry would not have it either, since this iteration creates
> > the registry and the rules for it.
> >
> > That's doable, but may continue to proliferate more and more variants
> > until it is addressed.
> 
> The same could be said for any new instructions that are added while we
sort
> out standardizing the assembly language as well, no?

Yes, that was my point.  If the initial ISA spec at time of publication
includes the
assembly language then there's no issue.

Not saying we have to wait, just that this which document to put it in is
what the WG should agree on in my view.

> > If it's in another document, do you agree it would still fall under
> > the existing charter bullet about "defining the instructions"
> > > [PS] the BPF instruction set architecture (ISA) that defines the
> > > instructions and low-level virtual machine for BPF programs,
> > ?
> 
> I wouldn't say it's illogical to group assembly language in this bucket,
but I
> would say that defining the assembly language does not need to be tied at
the
> hip with defining instruction encodings and semantics. So my answer is
"yes, I
> think it belongs here", but I also don't think it's necessary or desirable
for the
> first iteration.
> 
> > > > A second question would be, which dialect(s) to standardize.
> > > > Jose's link above argues that the second dialect should be the one
> > > > standardized (tools are free to support multiple dialects for
> > > > backwards compat if they want).  See the link for rationale.
> > >
> > > My recollection was that the outcome of that discussion is that we
> > > were
> > going
> > > to continue to support both. If we wanted to standardize, I have a
> > > hard
> > time
> > > seeing any other way other than to standardize both dialects unless
> > there's
> > > been a significant change in sentiment since LSFMM.
> >
> > If "standardize both", does that mean neither is mandatory and each
> > tool is free to pick one or the other?  And would the IANA registry
> > require a document adding any new instructions to specify the assembly
> > in both dialects?
> 
> Well, if we're standardizing on both, then yes I think it would be
mandatory for
> a tool to support both, and I think instructions would require assembly
for both
> dialects. Practically speaking that's already what's happening, no? Both
dialects
> are already pervasive, so it seems unlikely that a tool would succeed
without
> supporting both regardless.

There's plenty of counter examples of things that exist (whether they
"succeed"
or not depends on the definition of succeed) that support or supported
neither.
E.g., uBPF prior to Jose's talk.

> To Jose's point (pasted below), there are of course drawbacks:
> 
> > - Expensive :: it makes it very difficult to reuse infrastructure.
> > - Problematic :: dis/assemblers, CGEN, LaTeX, editors, IDEs, etc.
> > - Ambiguous :: with both GAS and llvm/MCParser: symbol assignments.
> > - Pervasive :: because of the inline asm.
> 
> I think it would be a lot simpler to standardize on only a single dialect,
but I also
> think the standard should reflect how BPF is being used in practice.

Dave