Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Sat, 1 Mar 2025 16:09:40 -0800

On Fri, Feb 28, 2025 at 8:40 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Fri, 2025-02-28 at 18:10 -0800, Alexei Starovoitov wrote:
>
> [...]
>
> > I think the end goal is to get rid of mark_reg_read() and
> > switch to proper live reg analysis.
> > So please include the numbers to see how much work left.
>
> Complete removal of mark_reg_read() means that analysis needs to be
> done for stack slots as well. The algorithm to handle stack slots is
> much more complicated:
> - it needs to track register / stack slot type to handle cases like
>   "r1 = r10" and spills of the stack pointer to stack;
> - it needs to track register values, at-least crudely, to handle cases
>   like "r1 = r10; r1 += r2;" (array access).

Doing this kind of register movement tracking before do_check()
may be difficult indeed.
Can we do this use/def tracking inline similar to current liveness,
but without ->parent logic.
Using postorder array that this patch adds ?

verifier states are path sensitive and more accurate
while this one will be insn based, but maybe good enough ?

> The worst case scenario, as you suggested, is just to assume stack
> slots live, but it is a big verification performance hit.
> Exact numbers are at the end of the email.
>
> > Also note that mark_reg_read() tracks 32 vs 64 reads separately.
> > iirc we did it to support fine grain mark_insn_zext
> > to help architectures where zext has to be inserted by JIT.
> > I'm not sure whether new liveness has to do it as well.
>
> As far as I understand, this is important for one check in
> propagate_liveness(). And that check means something like:
> "if this register was read as 64-bit value, remember that
>  it needs zero extension on 32-bit load".
>
> Meaning that either DFA would need to track this bit of information
> (should be simple), or more zero extensions would be added.

Right. New liveness doesn't break zext, but makes it worse
for arch that needs it. We would need to quantify the impact.
iirc it was noticeable enough that we added this support.

>
> ---
>
> Repository [1] shared in cover letter was used for benchmarks below.
> Abbreviations are as follows:
> - Name: dfa-opts
>   Commit: b73005452a4a
>   Meaning: DFA as shared in this patch-set + a set of small
>            improvements which I decided to exclude from the
>            patch-set as described in the cover letter.
> - Name: dfa-opts-no-rm
>   Commit: e486757fdada
>   Meaning: dfa-opts + read marks are disabled for registers.
> - Name: dfa-opts-no-rm-sl
>   Commit: a9930e8127a9
>   Meaning: dfa-opts + read marks are disabled for registers
>            and stack.
>
> [1] https://github.com/eddyz87/bpf/tree/liveregs-dfa-std-liveregs-off
>
> Veristat output is filtered using -f "states_pct>5" -f "!insns<200".
> Veristat results are followed by a histogram that accounts for all
> tests.
>
> Two comparisons are made:
> - dfa-opts vs dfa-opts-no-rm (small negative impact, except two
>   sched_ext programs that hit 1M instructions limit; positive impact
>   would have indicated a bug);

Let's figure out what is causing rusty_init[_task]
to explode.
And proceed with this set in parallel.

> - dfa-opts vs dfa-opts-no-rm-sl (big negative impact).

I don't read it as a big negative.
cls_redirect and balancer_ingress need to be understood,
but nothing exploded to hit 1M insns,
so hopefully bare minimum stack tracking would do the trick.

So in terms of priorities, let's land this set, then
figure out rusty_init,
figure out read32 vs 64 for zext,
at that time we may land -no-rm.
Then stack tracking.