BPF Verifier Overview

David Miller <davem@xxxxxxxxxxxxx> · Thu, 18 May 2017 14:00:09 -0400 (EDT)

The only thing sitting between our eBPF programs and a deep dark chasm
of destruction is the eBPF verifier.

Every eBPF program loaded into the kernel is checked by the verifier.
It is quite powerful, and provides a facility for introspection of
it's internal state so that analysis of the verifier's view of the
program can be performed.

The verifier performs many tests, but primarily it:

1) Transforms special MAP fd load instructions into MAP pointer one's.
   Userspace performs MAP loads using a specially code 64-bit load
   immediate instruction, with the file descriptor in the immediate
   field.  Normally the source register field is zero for a "ldimm64",
   but for these special MAP fd instructions the src_reg is set to '1'
   (BPF_PSEUDO_MAP_FD).

	ldimm64		rN, $FD		! and src_reg set to '1'

   The verifier uses the FD to look up the map pointer, and rewrites
   the above instruction into:

	ldimm64		rN, map_ptr

   Later, after the program has been validated, the src_reg field will
   be cleared to zero and then it will be well formed.

2) Build a control flow graph and and verify it.  A graph representing the
   control flow of the eBPF program is built, with edges connecting jumps
   to the destination basic blocks.

   The CFG is used to enforce two eBPF rules.

   a) No back-edges, which means no branching back to earlier instructions
      in the program and no loops.
   b) No unreachable instructions.

3) Finally the main full program check which analyzes every instruction,
   maintaining per-register state, and making sure no invalid operations
   are performed.

   One of the major purposes of this pass is to make sure that the
   dereferencing of pointers is always done in a safe and controlled
   manner.  When values from a known source are loaded into a register,
   the register acquires a type and this type and the register's other
   attributes are used to make sure an access is valid.

   The verifier has to consider all flows of control through the
   program, to check that all of the necessary constraints are
   followed no matter what set of paths are used on the way to the
   final BPF_EXIT of the program.

   In order to do this, the verifier has a stack of branches it has
   visited one arm of.  So at a jump, the verifier pushes the jump
   onto a stack, and continues down one of the two possible paths
   from that jump.

   Later, after hitting BPF_EXIT, the verifier starts popping entries
   off the of stack and visiting the opposite jump path.  This can
   get extremely expensive for programs with lots of jumps, so the
   verifier implements somethign called state pruning to minimize
   the amount of paths it has to follow.

   It is quite complicated, but the basic idea is that if we know the
   we've made more strict determinations about values in registers
   from the path we've already checked, compared to the path we are
   considering to take, then we don't have to visit that path at all.

   Once this step passes, the program has been accepted by the
   verifier.

4) Context accesses are converted.

   If you remember from our context discussion the other day, eBPF
   programs access SKB metadata via the passed in context, like this:

	SEC("my_program")
	int my_main(struct __sk_buff *skb)
	{
		void *data_end = (void *)(long)skb->data_end;
		void *data = (void *)(long)skb->data;

   The "struct __sk_buff" if an abstracted version of the real sk_buff
   in the kernel.  It uses fixed offsets so that we can burn in a
   eBPF program facing ABI that will never change, whilst we can
   still make whatever changes we want to the internal kernel sk_buff
   structure.

   So at this point the verifier converts the load instructions emitted
   for those "skb->data" dereferences so that they use the real offset
   the kernel's sk_buff structure has for those members.

5) Function calls are converted.

   Helper functions have a fixed code, which gets inserts into the
   immediate field of the BPF_CALL instructions.  The verifier
   translates this into the actual address of the helper function.

Now, I mentioned earlier that the verifier provides an introspection
mechanism.  This is via the verifier log buffer.

When you use the sys_bpf() system call to load a program, several
attributes are passed in.  One set of those are a LOG buffer pointer,
the length of that log, and a loggging level.

The verifier will emit every instruction is looks at, and by default,
at every basic block boundary, emit the internal register state.  If
the log level is increased to '1', then the internal register state
will be dumped after every instruction.

Let's look at an example, for the BPF code sequence:

	mov	r3, 2
	mov	r3, 4
	mov	r3, 8
	mov	r3, 16
	mov	r3, 32
	mov	r0, 0
	exit

The verifier dump at level 1 looks like:

0: R1=ctx R10=fp
0: (b7) r3 = 2
1: R1=ctx R3=imm2,min_value=2,max_value=2,min_align=2 R10=fp
1: (b7) r3 = 4
2: R1=ctx R3=imm4,min_value=4,max_value=4,min_align=4 R10=fp
2: (b7) r3 = 8
3: R1=ctx R3=imm8,min_value=8,max_value=8,min_align=8 R10=fp
3: (b7) r3 = 16
4: R1=ctx R3=imm16,min_value=16,max_value=16,min_align=16 R10=fp
4: (b7) r3 = 32
5: R1=ctx R3=imm32,min_value=32,max_value=32,min_align=32 R10=fp
5: (b7) r0 = 0
6: R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx R3=imm32,min_value=32,max_value=32,min_align=32 R10=fp
6: (95) exit

The first number on each line is the instruction number the verifier
is inspecting.  The verifier starts with register state:

	R1=ctx R10=fp

which means that R1 contains a non-NULL context pointer, and R10 is
a frame pointer.

After "mov r3, 2" is analyzed, we have register state:

1: R1=ctx R3=imm2,min_value=2,max_value=2,min_align=2 R10=fp

So what's new is that the verifier now sees that reigster R3 contains
a constant "2", the value range is 2 - 2, and the value is aligned
to "2".

You can capture dumps like this quite simply by using the
bpf_verify_program() library helper.  You can see how this is
used in tools/testing/samples/bpf/test_align.c

That's all for today...