The only thing sitting between our eBPF programs and a deep dark chasm of destruction is the eBPF verifier. Every eBPF program loaded into the kernel is checked by the verifier. It is quite powerful, and provides a facility for introspection of it's internal state so that analysis of the verifier's view of the program can be performed. The verifier performs many tests, but primarily it: 1) Transforms special MAP fd load instructions into MAP pointer one's. Userspace performs MAP loads using a specially code 64-bit load immediate instruction, with the file descriptor in the immediate field. Normally the source register field is zero for a "ldimm64", but for these special MAP fd instructions the src_reg is set to '1' (BPF_PSEUDO_MAP_FD). ldimm64 rN, $FD ! and src_reg set to '1' The verifier uses the FD to look up the map pointer, and rewrites the above instruction into: ldimm64 rN, map_ptr Later, after the program has been validated, the src_reg field will be cleared to zero and then it will be well formed. 2) Build a control flow graph and and verify it. A graph representing the control flow of the eBPF program is built, with edges connecting jumps to the destination basic blocks. The CFG is used to enforce two eBPF rules. a) No back-edges, which means no branching back to earlier instructions in the program and no loops. b) No unreachable instructions. 3) Finally the main full program check which analyzes every instruction, maintaining per-register state, and making sure no invalid operations are performed. One of the major purposes of this pass is to make sure that the dereferencing of pointers is always done in a safe and controlled manner. When values from a known source are loaded into a register, the register acquires a type and this type and the register's other attributes are used to make sure an access is valid. The verifier has to consider all flows of control through the program, to check that all of the necessary constraints are followed no matter what set of paths are used on the way to the final BPF_EXIT of the program. In order to do this, the verifier has a stack of branches it has visited one arm of. So at a jump, the verifier pushes the jump onto a stack, and continues down one of the two possible paths from that jump. Later, after hitting BPF_EXIT, the verifier starts popping entries off the of stack and visiting the opposite jump path. This can get extremely expensive for programs with lots of jumps, so the verifier implements somethign called state pruning to minimize the amount of paths it has to follow. It is quite complicated, but the basic idea is that if we know the we've made more strict determinations about values in registers from the path we've already checked, compared to the path we are considering to take, then we don't have to visit that path at all. Once this step passes, the program has been accepted by the verifier. 4) Context accesses are converted. If you remember from our context discussion the other day, eBPF programs access SKB metadata via the passed in context, like this: SEC("my_program") int my_main(struct __sk_buff *skb) { void *data_end = (void *)(long)skb->data_end; void *data = (void *)(long)skb->data; The "struct __sk_buff" if an abstracted version of the real sk_buff in the kernel. It uses fixed offsets so that we can burn in a eBPF program facing ABI that will never change, whilst we can still make whatever changes we want to the internal kernel sk_buff structure. So at this point the verifier converts the load instructions emitted for those "skb->data" dereferences so that they use the real offset the kernel's sk_buff structure has for those members. 5) Function calls are converted. Helper functions have a fixed code, which gets inserts into the immediate field of the BPF_CALL instructions. The verifier translates this into the actual address of the helper function. Now, I mentioned earlier that the verifier provides an introspection mechanism. This is via the verifier log buffer. When you use the sys_bpf() system call to load a program, several attributes are passed in. One set of those are a LOG buffer pointer, the length of that log, and a loggging level. The verifier will emit every instruction is looks at, and by default, at every basic block boundary, emit the internal register state. If the log level is increased to '1', then the internal register state will be dumped after every instruction. Let's look at an example, for the BPF code sequence: mov r3, 2 mov r3, 4 mov r3, 8 mov r3, 16 mov r3, 32 mov r0, 0 exit The verifier dump at level 1 looks like: 0: R1=ctx R10=fp 0: (b7) r3 = 2 1: R1=ctx R3=imm2,min_value=2,max_value=2,min_align=2 R10=fp 1: (b7) r3 = 4 2: R1=ctx R3=imm4,min_value=4,max_value=4,min_align=4 R10=fp 2: (b7) r3 = 8 3: R1=ctx R3=imm8,min_value=8,max_value=8,min_align=8 R10=fp 3: (b7) r3 = 16 4: R1=ctx R3=imm16,min_value=16,max_value=16,min_align=16 R10=fp 4: (b7) r3 = 32 5: R1=ctx R3=imm32,min_value=32,max_value=32,min_align=32 R10=fp 5: (b7) r0 = 0 6: R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx R3=imm32,min_value=32,max_value=32,min_align=32 R10=fp 6: (95) exit The first number on each line is the instruction number the verifier is inspecting. The verifier starts with register state: R1=ctx R10=fp which means that R1 contains a non-NULL context pointer, and R10 is a frame pointer. After "mov r3, 2" is analyzed, we have register state: 1: R1=ctx R3=imm2,min_value=2,max_value=2,min_align=2 R10=fp So what's new is that the verifier now sees that reigster R3 contains a constant "2", the value range is 2 - 2, and the value is aligned to "2". You can capture dumps like this quite simply by using the bpf_verify_program() library helper. You can see how this is used in tools/testing/samples/bpf/test_align.c That's all for today...