Re: [PATCH RFC bpf-next 5/9] xdp: controlling XDP-hints from BPF-prog via helper

"Zaremba, Larysa" <larysa.zaremba@xxxxxxxxx> · Mon, 4 Jul 2022 11:00:56 +0000

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> writes:
> 
> Jesper Dangaard Brouer <jbrouer@xxxxxxxxxx> writes:
> 
> > On 29/06/2022 16.20, Toke Høiland-Jørgensen wrote:
> >> Jesper Dangaard Brouer <brouer@xxxxxxxxxx> writes:
> >>
> >>> XDP BPF-prog's need a way to interact with the XDP-hints. This 
> >>> patch introduces a BPF-helper function, that allow XDP BPF-prog's 
> >>> to interact with the XDP-hints.
> >>>
> >>> BPF-prog can query if any XDP-hints have been setup and if this is 
> >>> compatible with the xdp_hints_common struct. If XDP-hints are 
> >>> available the BPF "origin" is returned (see enum
> >>> xdp_hints_btf_origin) as BTF can come from different sources or 
> >>> origins
> e.g. vmlinux, module or local.
> >>
> >> I'm not sure I quite understand what this origin is supposed to be 
> >> good for?
> >
> > Some background info on BTF is needed here: BTF_ID numbers are not 
> > globally unique identifiers, thus we need to know where it originate 
> > from, to make it unique (as we store this BTF_ID in XDP-hints).
> >
> > There is a connection between origin "vmlinux" and "module", which 
> > is that vmlinux will start at ID=1 and end at a max ID number.  
> > Modules refer to ID's in "vmlinux", and for this to work, they will 
> > shift their own numbering to start after ID=max-vmlinux-id.
> >
> > Origin "local" is for BTF information stored in the BPF-ELF object file.
> > Their numbering starts at ID=1.  The use-case is that a BPF-prog 
> > want to extend the kernel drivers BTF-layout, and e.g. add a 
> > RX-timestamp like [1].  Then BPF-prog can check if it knows module's 
> > BTF_ID and then extend via bpf_xdp_adjust_meta, and update BTF_ID in 
> > XDP-hints and call the helper (I introduced) marking this as origin 
> > "local" for kernel to know this is no-longer origin "module".
> 
> Right, I realise that :)
> 
> My point was that just knowing "this is a BTF ID coming from a module"
> is not terribly useful; you could already figure that out by just 
> looking at the ID and seeing if it's larger than the maximum ID in vmlinux BTF.
> 
> Rather, what we need is a way to identify *which* module the BTF ID 
> comes from; and luckily, the kernel assigns a unique ID to every BTF
> *object* as well as to each type ID within that object. These can be 
> dumped by bpftool:
> 
> # bpftool btf
> bpftool btf
> [sudo] password for alrua:
> 1: name [vmlinux]  size 4800187B
> 2: name [serio]  size 2588B
> 3: name [i8042]  size 11786B
> 4: name [rng_core]  size 8184B
> [...]
> 2062: name <anon>  size 36965B
> 	pids bpftool(547298)
> 
> IDs 2-4 are module BTF objects, and that last one is the ID of a BTF 
> object loaded along with a BPF program by bpftool itself... So we *do* 
> in fact have a unique ID, by combining the BTF object ID with the type 
> ID; this is what Alexander is proposing to put into the xdp-hints 
> struct as well (combining the two IDs into a single u64).

That's correct, concept was previously discussed [1]. The ID of BTF object wasn't
exposed in CO-RE allocations though, we've changed it in the first 4 patches.
The main logic is in "libbpf: factor out BTF loading from load_module_btfs()"
and "libbpf: patch module BTF ID into BPF insns".

We have a sample that wasn't included eventually, but can possibly
give a general understanding of our approach [2].

[1] https://lore.kernel.org/all/CAEf4BzZO=7MKWfx2OCwEc+sKkfPZYzaELuobi4q5p1bOKk4AQQ@xxxxxxxxxxxxxx/
[2] https://github.com/alobakin/linux/pull/16/files#diff-c5983904cbe0c280453d59e8a1eefb56c67018c38d5da0c1122abc86225fc7c9

> >> What is a BPF (or AF_XDP) program supposed to do with the 
> >> information "this XDP hints struct came from a module?" without 
> >> knowing which module that was?
> >
> > For AF_XDP my claim is the userspace program will already know that 
> > driver it is are talking to because it need to "bind" to a specific 
> > interface (and attach to XDP-prog to ifindex). See sample code[2] 
> > for get_driver_name from ifindex.
> > Thus, part of using XDP-hints already involves (resolving and) 
> > opening /sys/kernel/btf/driver_name.  So the origin "module" is 
> > enough for the API end-user to make the BTF_ID unique.
> 
> This will probably work in the most common cases, but offers no way to 
> verify that this "offline" resolution of module ID is actually correct.
> Explicitly encoding the full unique ID will be more robust.
> 
> > Runtime the BPF-prog and kernel can find out what net_device the 
> > origin "module" refers to via xdp_buff->rxq->dev.  When an 
> > end-user/program attach XDP they also need to know the ifindex, 
> > again giving them knowledge that make origin "module" BTF_ID's 
> > unique for them,
> 
> Right, but then the BPF program needs to keep its own lookup table 
> from ifindex to BTF ID? If we just encode the full ID in the packet, 
> it's a simple check, and we can likely create a "magic" CO-RE macro 
> that turns a struct definition into the right ID check at load time...
> 
> >> Ultimately, the origin is useful for a consumer to check that the 
> >> metadata is in the format that it's expecting it to be in (so it 
> >> can just load the data from the appropriate offsets). But to answer 
> >> this, we really need a unique identifier; so I think the approach 
> >> in Alexander's series of encoding the ID of the BTF structure 
> >> itself into the next 32 bits is better? That way we'll have a unique "pointer"
> >> to the actual struct that's in the metadata area and can act on this.
> >
> > I would really like an explanation from Alexander, how his approach 
> > creates unique identifier across all kernel modules.  I don't get it 
> > from reading the code.  To me it looks like some extra BTF "type"
> > information about the BTF_ID.
> >
> > E.g. how do BTF "local" BPF-ELF object's get a unique identifier, 
> > that doesn't overlap with e.g. kernel modules?
> 
> See above: the kernel generates a unique (until the next reboot) ID 
> for every BTF object when it's loaded into the kernel.
> 
> >>> RFC/TODO: Improve patch: Can verifier validate provided BTF on
> "update"
> >>> and detect if compatible with common struct???
> >>
> >> If we have the unique ID as mentioned above, I think the kernel 
> >> probably could resolve this automatically: whenever a module is 
> >> loaded, the kernel could walk the BTF information from that module 
> >> an simply inspect all the metadata structs and see if they contain 
> >> the embedded xdp_hints_common struct. The IDs of any metadata 
> >> structs that do contain the common struct can then be kept in a 
> >> central lookup table and the consumption code can then simply 
> >> compare the BTF ID to this table when building an SKB?
> >
> > I'm not against the idea for the kernel to keep track of these structs.
> > I just don't like the idea of checking this runtime, especially as 
> > this approach for walking all other modules BTF struct's doesn't scale.
> 
> Yeah, we should optimise this. See below...
> 
> >> As for the validation on the BPF side:n
> >>
> >>> +	if (flags & HINTS_BTF_UPDATE) {
> >>> +		is_compat_common = !!(flags &
> HINTS_BTF_COMPAT_COMMON);
> >>> +	/* TODO: Can kernel validate if hints are BTF compat with common?
> */
> >>> +	/* TODO: Could BPF prog provide BTF as ARG_PTR_TO_BTF_ID to
> prove
> >>> +compat_common ? */
> >>
> >> If we use the "global ID + lookup table" approach above, we don't 
> >> really need to validate anything here: if the program says it's 
> >> writing metadata with a format given by a specific ID, that implies 
> >> compatibility (or not) as given by the ID. We could sanity-check 
> >> the metadata area size, but the consumption code has to do that 
> >> anyway, so I'm not sure it's worth the runtime overhead to have an 
> >> additional check here?
> >
> > As you know I hate "runtime checks", and try hard to push checks to 
> > "setup time".  Maybe we could have verifier (or libbpf) do the check 
> > at setup/load time, by identifying the helper call and check if 
> > provided BTF do match COMPAT_COMMON claim.
> >
> > For this to work, the verifier need to be able to resolve origin 
> > "module", which happens at BPF load-time, so we would need to set 
> > the ifindex (curr used for XDP-hardware-offload) at BPF load-time.
> 
> If we make the UAPI on the BPF side just accept a full BTF object+type 
> ID, and also require that the value being passed to the helper is a 
> compile-time constant (so it is visible to the verifier at 
> verification time), it is straight- forward for the verifier to just 
> lookup the BTF type, check if it contains the "hints_common" struct 
> and if it does, rewrite the helper call to set the right value of the "compat_common"
> flag without exposing the flag itself as UAPI.
> 
> The driver code would probably still have to set this flag "manually", 
> but that's internal kernel API, so that's probably fine...
> 
> >> As for safety of the metadata content itself, I don't really think 
> >> we can do anything to guarantee this: in any case the BPF program 
> >> can pass a valid BTF ID and still write garbage values into the 
> >> actual fields, so the consumption code has to do enough validation 
> >> that this won't crash the kernel anyway. But this is no different 
> >> from the packet
> data itself:
> >> XDP is basically in a position to be a MITM attacker of the network 
> >> stack itself, which is why loading XDP programs is a privileged 
> >> operation...
> >
> > I agree, that we cannot stop the end-user from screwing up their 
> > BPF-prog to provide garbage in the fields, as long as it doesn't 
> > crash the kernel.  I do think it would improve usability for 
> > end-users if we can detect and report that their BPF-prog have 
> > gotten out of sync with the running kernel and their claim that 
> > their BTF layout are COMPAT_COMMON isn't actually true.  But I guess 
> > it is shouldn't block the code, as it's only an extra usability help.
> 
> Yeah, I agree this could be error prone; which is why I think not 
> exposing the flag itself as UAPI is a better solution ;)
> 
> -Toke

Please CC me in the hints discussions.
- Larysa