Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis

Eduard Zingerman <eddyz87@xxxxxxxxx> · Mon, 03 Mar 2025 11:28:56 -0800

On Sat, 2025-03-01 at 16:09 -0800, Alexei Starovoitov wrote:
> On Fri, Feb 28, 2025 at 8:40 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:

[...]

> > Complete removal of mark_reg_read() means that analysis needs to be
> > done for stack slots as well. The algorithm to handle stack slots is
> > much more complicated:
> > - it needs to track register / stack slot type to handle cases like
> >   "r1 = r10" and spills of the stack pointer to stack;
> > - it needs to track register values, at-least crudely, to handle cases
> >   like "r1 = r10; r1 += r2;" (array access).
> 
> Doing this kind of register movement tracking before do_check()
> may be difficult indeed.
> Can we do this use/def tracking inline similar to current liveness,
> but without ->parent logic.
> Using postorder array that this patch adds ?
> verifier states are path sensitive and more accurate
> while this one will be insn based, but maybe good enough ?

You mean act like precision tracking? Whenever instruction is verified
and use is recorded propagate this use upwards in execution path,
updating live-in/live-out sets for instructions?

The problem here is termination (when to consider live-in/live-out
sets final). DFA computation stops as soon as live-in/live-out marks
stop changing. Idk how this condition should look for the scheme
above.

[...]

> > > Also note that mark_reg_read() tracks 32 vs 64 reads separately.
> > > iirc we did it to support fine grain mark_insn_zext
> > > to help architectures where zext has to be inserted by JIT.
> > > I'm not sure whether new liveness has to do it as well.
> > 
> > As far as I understand, this is important for one check in
> > propagate_liveness(). And that check means something like:
> > "if this register was read as 64-bit value, remember that
> >  it needs zero extension on 32-bit load".
> > 
> > Meaning that either DFA would need to track this bit of information
> > (should be simple), or more zero extensions would be added.
> 
> Right. New liveness doesn't break zext, but makes it worse
> for arch that needs it. We would need to quantify the impact.
> iirc it was noticeable enough that we added this support.

I'm surprised that no test_progs or test_verifier tests a broken.
Agree that this needs to be quantified.

[...]

> > Two comparisons are made:
> > - dfa-opts vs dfa-opts-no-rm (small negative impact, except two
> >   sched_ext programs that hit 1M instructions limit; positive impact
> >   would have indicated a bug);
> 
> Let's figure out what is causing rusty_init[_task]
> to explode.
> And proceed with this set in parallel.

Will do.

> > - dfa-opts vs dfa-opts-no-rm-sl (big negative impact).
> 
> I don't read it as a big negative.
> cls_redirect and balancer_ingress need to be understood,
> but nothing exploded to hit 1M insns,
> so hopefully bare minimum stack tracking would do the trick.
> 
> So in terms of priorities, let's land this set, then
> figure out rusty_init,
> figure out read32 vs 64 for zext,
> at that time we may land -no-rm.
> Then stack tracking.

Tbh, I think that landing dfa-opts-no-rm separately from
dfa-opts-no-rm-sl doesn't make things much simpler.
The register chain based liveness computation would still be a thing.
I'd try to figure out how to make the dfa-opts-no-rm-sl variant faster
first.