On Thu, 2023-01-26 at 11:50 -0800, Andrii Nakryiko wrote: > On Tue, Jan 24, 2023 at 2:04 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > > > This is a followup for [1], adds an overview for the register liveness > > tracking, covers the following points: > > - why register liveness tracking is useful; > > - how register parentage chains are constructed; > > - how liveness marks are applied using the parentage chains. > > > > [1] https://lore.kernel.org/bpf/CAADnVQKs2i1iuZ5SUGuJtxWVfGYR9kDgYKhq3rNV+kBLQCu7rA@xxxxxxxxxxxxxx/ > > > > Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx> > > --- > > Documentation/bpf/verifier.rst | 237 +++++++++++++++++++++++++++++++++ > > 1 file changed, 237 insertions(+) > > > > diff --git a/Documentation/bpf/verifier.rst b/Documentation/bpf/verifier.rst > > index d4326caf01f9..77578ed5a277 100644 > > --- a/Documentation/bpf/verifier.rst > > +++ b/Documentation/bpf/verifier.rst > > @@ -316,6 +316,243 @@ Pruning considers not only the registers but also the stack (and any spilled > > registers it may hold). They must all be safe for the branch to be pruned. > > This is implemented in states_equal(). > > > > +Some technical details about state pruning implementation could be found below. > > + > > +Register liveness tracking > > +-------------------------- > > + > > +In order to make state pruning effective liveness state is tracked for each > > +register and stack spill slot. The basic idea is to identify which registers or > > not just spill slot, right? Any stack slot (including scalars) Yes, I'll update the wording to be "stack slot". > > > +stack slots were used by children of the state to reach the program exit. > > +Registers that were never used could be removed from the cached state thus > > +making more states equivalent to a cached state. This could be illustrated by > > +the following program:: > > + > > + 0: call bpf_get_prandom_u32() > > + 1: r1 = 0 > > + 2: if r0 == 0 goto +1 > > + 3: r0 = 1 > > + --- checkpoint --- > > + 4: r0 = r1 > > + 5: exit > > + > > +Suppose that state cache entry is created at instruction #4 (such entries are > > +also called "checkpoints" in the text below), verifier reaches this instruction > > +in two states: > > +* r0 = 1, r1 = 0 > > +* r0 = 0, r1 = 0 > > + > > +However, only the value of register ``r1`` is important to successfully finish > > +verification. The goal of liveness tracking algorithm is to spot this fact and > > +figure out that both states are actually equivalent. > > + > > +Data structures > > +~~~~~~~~~~~~~~~ > > + > > +Liveness is tracked using the following data structures:: > > + > > + enum bpf_reg_liveness { > > + REG_LIVE_NONE = 0, > > + REG_LIVE_READ32 = 0x1, > > + REG_LIVE_READ64 = 0x2, > > + REG_LIVE_READ = REG_LIVE_READ32 | REG_LIVE_READ64, > > + REG_LIVE_WRITTEN = 0x4, > > + REG_LIVE_DONE = 0x8, > > + }; > > + > > + struct bpf_reg_state { > > + ... > > + struct bpf_reg_state *parent; > > + ... > > + enum bpf_reg_liveness live; > > + ... > > + }; > > + > > + struct bpf_stack_state { > > + struct bpf_reg_state spilled_ptr; > > + ... > > + }; > > + > > + struct bpf_func_state { > > + struct bpf_reg_state regs[MAX_BPF_REG]; > > + ... > > + struct bpf_stack_state *stack; > > + } > > + > > + struct bpf_verifier_state { > > + struct bpf_func_state *frame[MAX_CALL_FRAMES]; > > + struct bpf_verifier_state *parent; > > + ... > > + } > > + > > +* ``REG_LIVE_NONE`` is an initial value assigned to ``->live`` fields upon new > > + verifier state creation; > > +* ``REG_LIVE_WRITTEN`` means that the value of the register (or spill slot) is > > + defined by some instruction "executed" within verifier state; > > +* ``REG_LIVE_READ{32,64}`` means that the value of the register (or spill slot) > > ditto, "stack slot" is more generic and correct? > > > + is read by some instruction "executed" within verifier state; > > +* ``REG_LIVE_DONE`` is a marker used by ``clean_verifier_state()`` to avoid > > + processing same verifier state multiple times and for some sanity checks; > > +* ``->live`` field values are formed by combining ``enum bpf_reg_liveness`` > > + values using bitwise or. > > + > > +Register parent chains > > +~~~~~~~~~~~~~~~~~~~~~~ > > + > > +In order to propagate information between parent and child states register > > +parentage chain is established. Each register or spilled stack slot is linked to > > +a corresponding register or stack spill slot in it's parent state via a > > +``->parent`` pointer. This link is established upon state creation in function > > +``is_state_visited()`` and is never changed. > > + > > +The rules for correspondence between registers / stack spill slots are as > > +follows: > > + > > +* For current stack frame registers and stack spill slots of the new state are > > + linked to the registers and stack spill slots of the parent state with the > > + same indices. > > + > > +* For outer stack frames only caller saved registers (r6-r9) and stack spill > > + slots are linked to the registers and stack spill slots of the parent state > > + with the same indices. > > + > > +This could be illustrated by the following diagram (arrows stand for > > +``->parent`` pointers):: > > + > > + ... ; Frame #0, some instructions > > + --- checkpoint #0 --- > > + 1 : r6 = 42 ; Frame #0 > > + --- checkpoint #1 --- > > + 2 : call foo() ; Frame #0 > > + ... ; Frame #1, instructions from foo() > > + --- checkpoint #2 --- > > + ... ; Frame #1, instructions from foo() > > + --- checkpoint #3 --- > > + exit ; Frame #1, return from foo() > > + 3 : r1 = r6 ; Frame #0 <- current state > > + > > + +--------------------------+--------------------------+ > > + | Frame #0 | Frame #1 | > > + Checkpoint +--------------------------+--------------------------+ > > + #0 | r0-r5 | r6-r9 | fp-8 ... | > > + +--------------------------+ > > + ^ ^ ^ > > + | | | > > + Checkpoint +--------------------------+ > > + #1 | r0-r5 | r6-r9 | fp-8 ... | > > + +--------------------------+ > > + ^ ^ > > + | | > > + Checkpoint +--------------------------+--------------------------+ > > + #2 | r0-r5 | r6-r9 | fp-8 ... | r0-r5 | r6-r9 | fp-8 ... | > > + +--------------------------+--------------------------+ > > wouldn't r6-r9 in frame #1 point to r6-r9 of frame #0 in parent state? > Or do they point to r6-r9 of frame #0 in the same checkpoint? Either > way, should we clarify that part on the diagram somehow? This is the loop at the end of is_state_visited() that sets up the register links (initially the links are zero because of the kzalloc() call used for memory allocation): cur->parent = new; ... for (j = 0; j <= cur->curframe; j++) { for (i = j < cur->curframe ? BPF_REG_6 : 0; i < BPF_REG_FP; i++) cur->frame[j]->regs[i].parent = &new->frame[j]->regs[i]; ... } The `new` refers to `struct bpf_verifier_state` instance that would remain in cache, the `cur` refers to the state that would be used to continue verification. Note that all links in this loop are setup only for identical indices, so registers from frame #1 can't point to registers from frame #0. As far as I understand the following happens (please correct me if I'm wrong): - when checkpoint #1 is created from the state derived from checkpoint #0, the cur->curframe is 0; - when `call foo` is processed a new instance of `struct bpf_func_state` is allocated to represent frame #1, all parent pointers within this instance are set to null (see call `kzalloc(*callee)` in __check_func_call()); - checkpoint #2 is derived from a state described above, thus frame #1 r0-r9 parent links are null. I'll add a note on the diagram that the links are null. > > > + ^ ^ ^ ^ ^ > > + | | | | | > > + Checkpoint +--------------------------+--------------------------+ > > + #3 | r0-r5 | r6-r9 | fp-8 ... | r0-r5 | r6-r9 | fp-8 ... | > > + +--------------------------+--------------------------+ > > + ^ ^ > > + | | > > + Current +--------------------------+ > > + state | r0-r5 | r6-r9 | fp-8 ... | > > + +--------------------------+ > > + \ > > + r6 read mark is propagated via > > + these links all the way up to > > + checkpoint #1. > > + > > +Liveness marks tracking > > +~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +For each processed instruction verifier propagates information about reads up > > +the parentage chain and saves information about writes in the current state. > > +The information about reads is propagated by function ``mark_reg_read()`` which > > +could be summarized as follows:: > > + > > + mark_reg_read(struct bpf_reg_state *state): > > + parent = state->parent > > + while parent: > > + if parent->live & REG_LIVE_WRITTEN: > > + break > > + if parent->live & REG_LIVE_READ64: > > + break > > + parent->live |= REG_LIVE_READ64 > > + state = parent > > + parent = state->parent > > + > > +Note: details about REG_LIVE_READ32 are omitted. > > + > > +Also note: the read marks are applied to the *parent* state while write marks > > +are applied to the *current* state. > > + > > +Because stack writes could have different sizes ``REG_LIVE_WRITTEN`` marks are > > +applied conservatively: stack spills are marked as written only if write size > > +corresponds to the size of the register, see function ``save_register_state()`` > > +for an example. > > so this paragraph mentions a very subtle interaction for stack slots > that only get partial register spill and keep other bytes as > STACK_MISC. I don't know how much into details we should go her, but > maybe calling out explicitly that this has implications when we mix > register spill and scalar data in a single 8-byte stack slot would be > useful? I can add the following text: --- 8< ----------------------------- Consider the following example:: 0: (*u64)(r10 - 8) = 0 --- checkpoint #0 --- 1: (*u32)(r10 - 8) = 1 2: r1 = (*u32)(r10 - 8) Because write size at instruction (1) is smaller than register size the write mark will not be added to fp-8 slot when (1) is verified, thus the fp-8 read at instruction (2) would propagate read mark for fp-8 up to checkpoint #0. ----------------------------- >8 --- wdyt? > > > + > > +Once ``BPF_EXIT`` instruction is reached function ``update_branch_counts()`` is > > +called to update the ``->branches`` counter for each verifier state in a chain > > +of parent verifier states. When ``->branches`` counter reaches zero the verifier > > +state becomes a valid entry in a set of cached verifier states. > > + > > +Each entry of the verifier states cache is post-processed by a function > > +``clean_live_states()``. This function marks all registers and stack spills > > +without ``REG_LIVE_READ{32,64}`` marks as ``NOT_INIT`` or ``STACK_INVALID``. > > +Registers/stack spills marked in this way are ignored in function ``stacksafe()`` > > +called from ``states_equal()`` when state cache entry is considered for > > +equivalence with a current state. > > + > > +Now it is possible to explain how the example from the beginning of the section > > +works:: > > + > > + 0: call bpf_get_prandom_u32() > > + 1: r1 = 0 > > + 2: if r0 == 0 goto +1 > > + 3: r0 = 1 > > + --- checkpoint[0] --- > > + 4: r0 = r1 > > + 5: exit > > + > > +* At instruction #2 branching point is reached and state ``{ r0 == 0, r1 == 0, pc == 4 }`` > > + is pushed to states processing queue (pc stands for program counter). > > +* At instruction #4: > > + > > + * ``checkpoint[0]`` states cache entry is created: ``{ r0 == 1, r1 == 0, pc == 4 }``; > > + * ``checkpoint[0].r0`` is marked as written; > > + * ``checkpoint[0].r1`` is marked as read; > > + > > +* At instruction #5 exit is reached and ``checkpoint[0]`` can now be processed > > + by ``clean_live_states()``, after this processing ``checkpoint[0].r0`` has a > > + read mark and all other registers and stack spills are marked as ``NOT_INIT`` > > + or ``STACK_INVALID`` > > +* The state ``{ r0 == 0, r1 == 0, pc == 4 }`` is popped from the states queue > > + and is compared against a cached state ``{ r1 == 0, pc == 4 }``, the states > > + are considered equivalent. > > + > > +Read marks propagation for cache hits > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +Another important point is handling of read marks when a previously verified > > +state is found in the states cache. All read marks present on registers and > > +stack spills of the cached state must be propagated over the parentage chain of > > +the current state. Function ``propagate_liveness()`` handles this case. > > + > > +For example, consider the following state parentage chain:: > > + > > + +---+ > > + A ---> | B | ---> exit > > + | | > > + C ---> | D | > > + +---+ > > + > > +* Chain ``A -> B -> exit`` is verified first; > > +* State ``B`` has read marks for registers ``r1`` and ``r2``; > > +* State ``D`` is considered equivalent to state ``B``; > > + > > +* Read marks for ``r1`` and ``r2`` have to be added in state ``C``, otherwise > > + state ``C`` might get mistakenly marked as equivalent to some future state > > + ``E`` with different values for registers in question. > > So I'm a bit confused by this section. We are talking about read marks > in B and D, and then imply that C has to get it as well, but we don't > mention what happens in A. The implication here is probably that read > marks that were bubbled up into A should be bubbled up into C, right? > But that's not very obvious? Could there be cases where read marks in > B are screened in A and never make into it? In that case similar thing > will presumably happen for C? I agree it might be confusing, what about the following text instead: --- 8< ----------------------------- For example, consider the following state parentage chain (S is a starting state, A-E are derived states, -> arrows show which state is derived from which):: r1 read <------------- A[r1] == 0 +---+ C[r1] == 0 S ---> A ---> | B | ---> exit E[r1] == 1 | | | ` ---> C ---> | D | | +---+ ` ---> E ^ ^ |___ suppose all these | states are at insn #Y suppose all these states are at insn #X * Chain of states ``S -> A -> B -> exit`` is verified first. * While ``B -> exit`` is verified register ``r1`` is read and this read mark is propagated up to state ``A``. * When chain of states ``C -> D`` is verified the state ``D`` turns out to be equivalent to state ``B``. * The read mark for ``r1`` has to be propagated to state ``C``, otherwise state ``C`` might get mistakenly marked as equivalent to state ``E`` even though values for register ``r1`` differ between ``C`` and ``E``. ----------------------------- >8 --- The reason I don't want to put an example program here is that example programs I can come up with are even more confusing than explanation above. > I'd also call out conceptual similarity of this after the fact > propagation of liveness marks to precision marks. It's the same idea > and need. Could you please suggest wording for this point? > > > + > > Understanding eBPF verifier messages > > ==================================== > > > > -- > > 2.39.0 > >