Re: [xdp-hints] Re: [PATCH RFC bpf-next 5/9] xdp: controlling XDP-hints from BPF-prog via helper

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Fri, 01 Jul 2022 14:09:14 +0200

Jesper Dangaard Brouer <jbrouer@xxxxxxxxxx> writes:

> On 29/06/2022 16.20, Toke Høiland-Jørgensen wrote:
>> Jesper Dangaard Brouer <brouer@xxxxxxxxxx> writes:
>> 
>>> XDP BPF-prog's need a way to interact with the XDP-hints. This patch
>>> introduces a BPF-helper function, that allow XDP BPF-prog's to interact
>>> with the XDP-hints.
>>>
>>> BPF-prog can query if any XDP-hints have been setup and if this is
>>> compatible with the xdp_hints_common struct. If XDP-hints are available
>>> the BPF "origin" is returned (see enum xdp_hints_btf_origin) as BTF can
>>> come from different sources or origins e.g. vmlinux, module or local.
>> 
>> I'm not sure I quite understand what this origin is supposed to be good
>> for? 
>
> Some background info on BTF is needed here: BTF_ID numbers are not
> globally unique identifiers, thus we need to know where it originate
> from, to make it unique (as we store this BTF_ID in XDP-hints).
>
> There is a connection between origin "vmlinux" and "module", which is
> that vmlinux will start at ID=1 and end at a max ID number.  Modules
> refer to ID's in "vmlinux", and for this to work, they will shift their
> own numbering to start after ID=max-vmlinux-id.
>
> Origin "local" is for BTF information stored in the BPF-ELF object file.
> Their numbering starts at ID=1.  The use-case is that a BPF-prog want to
> extend the kernel drivers BTF-layout, and e.g. add a RX-timestamp like
> [1].  Then BPF-prog can check if it knows module's BTF_ID and then
> extend via bpf_xdp_adjust_meta, and update BTF_ID in XDP-hints and call
> the helper (I introduced) marking this as origin "local" for kernel to
> know this is no-longer origin "module".

Right, I realise that :)

My point was that just knowing "this is a BTF ID coming from a module"
is not terribly useful; you could already figure that out by just
looking at the ID and seeing if it's larger than the maximum ID in
vmlinux BTF.

Rather, what we need is a way to identify *which* module the BTF ID
comes from; and luckily, the kernel assigns a unique ID to every BTF
*object* as well as to each type ID within that object. These can be
dumped by bpftool:

# bpftool btf
bpftool btf      
[sudo] password for alrua: 
1: name [vmlinux]  size 4800187B
2: name [serio]  size 2588B
3: name [i8042]  size 11786B
4: name [rng_core]  size 8184B
[...]
2062: name <anon>  size 36965B
	pids bpftool(547298)

IDs 2-4 are module BTF objects, and that last one is the ID of a BTF
object loaded along with a BPF program by bpftool itself... So we *do*
in fact have a unique ID, by combining the BTF object ID with the type
ID; this is what Alexander is proposing to put into the xdp-hints struct
as well (combining the two IDs into a single u64).

>> What is a BPF (or AF_XDP) program supposed to do with the
>> information "this XDP hints struct came from a module?" without knowing
>> which module that was? 
>
> For AF_XDP my claim is the userspace program will already know that
> driver it is are talking to because it need to "bind" to a specific
> interface (and attach to XDP-prog to ifindex). See sample code[2] for
> get_driver_name from ifindex.
> Thus, part of using XDP-hints already involves (resolving and) opening
> /sys/kernel/btf/driver_name.  So the origin "module" is enough for the
> API end-user to make the BTF_ID unique.

This will probably work in the most common cases, but offers no way to
verify that this "offline" resolution of module ID is actually correct.
Explicitly encoding the full unique ID will be more robust.

> Runtime the BPF-prog and kernel can find out what net_device the origin
> "module" refers to via xdp_buff->rxq->dev.  When an end-user/program
> attach XDP they also need to know the ifindex, again giving them
> knowledge that make origin "module" BTF_ID's unique for them,

Right, but then the BPF program needs to keep its own lookup table from
ifindex to BTF ID? If we just encode the full ID in the packet, it's a
simple check, and we can likely create a "magic" CO-RE macro that turns
a struct definition into the right ID check at load time...

>> Ultimately, the origin is useful for a consumer
>> to check that the metadata is in the format that it's expecting it to be
>> in (so it can just load the data from the appropriate offsets). But to
>> answer this, we really need a unique identifier; so I think the approach
>> in Alexander's series of encoding the ID of the BTF structure itself
>> into the next 32 bits is better? That way we'll have a unique "pointer"
>> to the actual struct that's in the metadata area and can act on this.
>
> I would really like an explanation from Alexander, how his approach
> creates unique identifier across all kernel modules.  I don't get it
> from reading the code.  To me it looks like some extra BTF "type"
> information about the BTF_ID.
>
> E.g. how do BTF "local" BPF-ELF object's get a unique identifier, that
> doesn't overlap with e.g. kernel modules?

See above: the kernel generates a unique (until the next reboot) ID for
every BTF object when it's loaded into the kernel.

>>> RFC/TODO: Improve patch: Can verifier validate provided BTF on "update"
>>> and detect if compatible with common struct???
>> 
>> If we have the unique ID as mentioned above, I think the kernel probably
>> could resolve this automatically: whenever a module is loaded, the
>> kernel could walk the BTF information from that module an simply inspect
>> all the metadata structs and see if they contain the embedded
>> xdp_hints_common struct. The IDs of any metadata structs that do contain
>> the common struct can then be kept in a central lookup table and the
>> consumption code can then simply compare the BTF ID to this table when
>> building an SKB?
>
> I'm not against the idea for the kernel to keep track of these structs.
> I just don't like the idea of checking this runtime, especially as this
> approach for walking all other modules BTF struct's doesn't scale.

Yeah, we should optimise this. See below...

>> As for the validation on the BPF side:n
>> 
>>> +	if (flags & HINTS_BTF_UPDATE) {
>>> +		is_compat_common = !!(flags & HINTS_BTF_COMPAT_COMMON);
>>> +	/* TODO: Can kernel validate if hints are BTF compat with common? */
>>> +	/* TODO: Could BPF prog provide BTF as ARG_PTR_TO_BTF_ID to prove compat_common ? */
>> 
>> If we use the "global ID + lookup table" approach above, we don't really
>> need to validate anything here: if the program says it's writing
>> metadata with a format given by a specific ID, that implies
>> compatibility (or not) as given by the ID. We could sanity-check the
>> metadata area size, but the consumption code has to do that anyway, so
>> I'm not sure it's worth the runtime overhead to have an additional check
>> here?
>
> As you know I hate "runtime checks", and try hard to push checks to
> "setup time".  Maybe we could have verifier (or libbpf) do the check at
> setup/load time, by identifying the helper call and check if provided
> BTF do match COMPAT_COMMON claim.
>
> For this to work, the verifier need to be able to resolve origin
> "module", which happens at BPF load-time, so we would need to set the
> ifindex (curr used for XDP-hardware-offload) at BPF load-time.

If we make the UAPI on the BPF side just accept a full BTF object+type
ID, and also require that the value being passed to the helper is a
compile-time constant (so it is visible to the verifier at verification
time), it is straight-forward for the verifier to just lookup the BTF
type, check if it contains the "hints_common" struct and if it does,
rewrite the helper call to set the right value of the "compat_common"
flag without exposing the flag itself as UAPI.

The driver code would probably still have to set this flag "manually",
but that's internal kernel API, so that's probably fine...

>> As for safety of the metadata content itself, I don't really think we
>> can do anything to guarantee this: in any case the BPF program can pass
>> a valid BTF ID and still write garbage values into the actual fields, so
>> the consumption code has to do enough validation that this won't crash
>> the kernel anyway. But this is no different from the packet data itself:
>> XDP is basically in a position to be a MITM attacker of the network
>> stack itself, which is why loading XDP programs is a privileged
>> operation...
>
> I agree, that we cannot stop the end-user from screwing up their
> BPF-prog to provide garbage in the fields, as long as it doesn't crash
> the kernel.  I do think it would improve usability for end-users if we
> can detect and report that their BPF-prog have gotten out of sync with
> the running kernel and their claim that their BTF layout are
> COMPAT_COMMON isn't actually true.  But I guess it is shouldn't block
> the code, as it's only an extra usability help.

Yeah, I agree this could be error prone; which is why I think not
exposing the flag itself as UAPI is a better solution ;)

-Toke