On 31/01/2023 18:11, Eduard Zingerman wrote: > This is a followup for [1], adds an overview for the register liveness > tracking, covers the following points: > - why register liveness tracking is useful; > - how register parentage chains are constructed; > - how liveness marks are applied using the parentage chains. > > [1] https://lore.kernel.org/bpf/CAADnVQKs2i1iuZ5SUGuJtxWVfGYR9kDgYKhq3rNV+kBLQCu7rA@xxxxxxxxxxxxxx/ > > Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx> > --- > Documentation/bpf/verifier.rst | 280 +++++++++++++++++++++++++++++++++ > 1 file changed, 280 insertions(+) ... > + Current +-------------------------------+ > + state | r0 | r1-r5 | r6-r9 | fp-8 ... | > + +-------------------------------+ > + \ > + r6 read mark is propagated via > + these links all the way up to > + checkpoint #1. Perhaps explicitly mention here that the reason it doesn't propagate to checkpoint #0 (despite the arrow) is that there's a write mark on c1[r6]. Also worth mentioning somewhere that write marks are really a property of the arrow, not the state — a write mark in c#1 tells us that the straight-line code from c#0 to c#1 contains a write (thus 'breaking' that arrow for read mark propagation); it lives in c#1's data structures because it's c#1 that needs to 'know' about it, whereas c#0 (and its parents) need to 'know' about any *reads* in the straight-line code from c#0 to c#1 (but these are of no interest to c#1). I sometimes summarise this with the phrase "read up, write down", though idk how useful that is to anyone outside of my head ;) > +Liveness marks tracking > +======================= > + > +For each processed instruction, the verifier propagates information about reads > +up the parentage chain and saves information about writes in the current state. > +The information about reads is propagated by function ``mark_reg_read()`` which > +could be summarized as follows:: > + > + mark_reg_read(struct bpf_reg_state *state): > + parent = state->parent > + while parent: > + if parent->live & REG_LIVE_WRITTEN: > + break This isn't correct; we look at `state->live` here, because if in the straight-line code since the last checkpoint (parent) there's a write to this register, then reads should not propagate to `parent`. Then there's the complication of the `writes` variable in mark_reg_read(); that's explained by the comment on propagate_liveness(), which AFAICT you don't cover in your doc section about that. (And note that `writes` is only ever false for the first iteration through the mark_reg_read() loop). > + if parent->live & REG_LIVE_READ64: > + break > + parent->live |= REG_LIVE_READ64 > + state = parent > + parent = state->parent > + > +Note: details about REG_LIVE_READ32 are omitted. > + > +Also note: the read marks are applied to the **parent** state while write marks > +are applied to the **current** state. May be worth stating that the principle of the algorithm is that read marks propagate back along the chain until they hit a write mark, which 'screens off' earlier states from the read. Your doc implies this but afaict never states it explicitly, and I think it makes the algo easier to understand for someone who doesn't already know what it's all about. Apart from that, this is great. I particularly like your diagram of the parentage chains. -ed