Re: bpf signing. Re: [POC][RFC][PATCH] bpf: in-kernel bpf relocations on raw elf files

John Fastabend <john.fastabend@xxxxxxxxx> · Thu, 23 Jan 2025 23:05:03 -0800

On 2025-01-23 21:08:14, Alexei Starovoitov wrote:
> On Tue, Jan 14, 2025 at 10:24 AM Blaise Boscaccy
> <bboscaccy@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > It looks like they are done in the kernel and not necessarily by the
> > kernel? The relocation logic is emitted by emit_relo* functions during
> > skeleton generation and the ebpf program is responsible for relocating
> > itself at runtime, correct? Meaning that the same program is going to
> > appear very different to the kernel if it's loaded via lskel or libbpf?
> 
> Looks like you're reading the code without actually trying to run it.
> 
> > >> Would it be amenable to possibly alter the light skeleton generation
> > >> code to pass btf and some other metadata into the kernel along with
> > >> instructions or are you trying to avoid any sort of fixed dependencies
> > >> on anything in the kernel other than the bpf instrucion set itself?
> > >
> > > BTF is passed in the lskel.
> > > There are few relocation-like things that lskel doesn't support.
> > > One example is __kconfig, but so far there was no request to support that.
> > > This can be added when needs arise.
> >
> > Yes, I ran into the lskel generator doing fun stuff like:
> >
> > libbpf: extern (kcfg) 'LINUX_KERNEL_VERSION': set to 0x6080c
> >
> > Which caused some concern. Is the feature set for the light skeleton
> > generator and the feature set for libbpf is expected to drift, whereas
> > new features will get added to libbpf but they will get added to the
> > lskel generator if and only if someone requests support for it?
> 
> Correct.
> 
> > Ancillary, would there be opposition to passing the symbol table into
> > the kernel via the light skeleton?
> 
> Yes, if by "symbol table" you mean ELF symbol table.
> 
> > I couldn't find anything tangible related to a 'gate keeper' on the bpf
> > mailing list and haven't attended the conferences.  Are you going to
> > shoot down all attempts at code signing of eBPF programs in the kernel?
> 
> gate keeper concept is the sign verification by the kernel.
> 
> > Internally, we want to cryptographically verify all running kernel code
> > with a proper root of trust. Additionally we've been looking into
> > NIST-800-172 requirements. That's currently making eBPF a no-go.  Root
> > and userspace are not trusted either in these contexts, making userspace
> > gate-keeper daemons unworkable.
> 
> The idea was to add LSM-like hook in the prog loading path where
> "gate keeper" bpf program loaded early during the boot
> (without any user space) would validate the signature attached
> to lskel and whatever other prog attributes it might need.
> 
> KP proposed:
> https://lore.kernel.org/bpf/CACYkzJ6xSk_DHO+3JoCYpGrXjFkk9v-LOSWW0=0KLwAj1Gc0SA@xxxxxxxxxxxxxx/
> 
> iirc John had the whole design proposal written somewhere,
> but I cannot find it now.
> 
> John,
> can you summarize how gate keeper bpf prog would work?

Sure. The gate keeper can attach at bpf_prog_load time, note there is
already a security hook there we can hook to with the bpf_prog struct
as the only arg. At this point any number of policy about what/who can
load BPF programs can be applied by looking at the struct and context
its being called. For better use of crypto functions we would want this
to be a sleepable program.

Why it needs to be a BPF prog in this model is because I expect the
policy may be very different depending on the env. We have K8s
systems, DPUs, VMs, embedded systems all running BPF and each has
different requirements and different policy metadata.

With BPF/IMA or fsverity infra the caller can be identified by a
hash giving the identity of the loader. This works today.

We can also check a signature of the skel here if needed. Maybe some
kfuncs are still needed (and make it sleepable) I haven't done this
part yet. I found binding identity of the loader to types of programs
is a good starting point. A roster of all BPF programs loaded in a
cluster is doable now. Anyways a kfunc to consume bpf_prog and key
details to return good/bad is probably fine? Or break it down into
the individual ops would be more flexible. This should be enough
to solve the cryptographically verify BPF programs.

There is also an idea that we could provide more metadata about the
program by having the verifier include a summary. One proposed example
was to track helpers/kfuns in use. For example a network program that
can inspect traffic, but not redirect it.

End result is we could build a policy that says these programs can
load these specific BPF programs. And keep those in maps so it can
be updated dynamically on a bunch of running systems. I think you
want the dynamic part so you can have some process to say I'm
adding these new debug programs or new critical security fixes
to the list of allowed BPF programs.

Some other commentary:

Also to be complete a way to load BPF programs in early boot would
reduce/eliminate a window between launched trusted kernel and gate
keeper launch.

Either the gate keeper can ensure it can't be unloaded by also
monitoring those paths or we could just pin a refcnt on it when a
flag is set or it comes from early boot.

Map updates/manipulation can also wreck BPF logic so you will want to
also have the gate keeper track that.

As a first step just making it sleepable and exposing the needed
kfuncs would be realtively easy and get what you need I suspect.
Added the gatekeeper BPF prog at early boot would likely be all
you need?

Thanks,
John