Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2025-02-28 at 18:10 -0800, Alexei Starovoitov wrote:

[...]

> I think the end goal is to get rid of mark_reg_read() and
> switch to proper live reg analysis.
> So please include the numbers to see how much work left.

Complete removal of mark_reg_read() means that analysis needs to be
done for stack slots as well. The algorithm to handle stack slots is
much more complicated:
- it needs to track register / stack slot type to handle cases like
  "r1 = r10" and spills of the stack pointer to stack;
- it needs to track register values, at-least crudely, to handle cases
  like "r1 = r10; r1 += r2;" (array access).

The worst case scenario, as you suggested, is just to assume stack
slots live, but it is a big verification performance hit.
Exact numbers are at the end of the email.

> Also note that mark_reg_read() tracks 32 vs 64 reads separately.
> iirc we did it to support fine grain mark_insn_zext
> to help architectures where zext has to be inserted by JIT.
> I'm not sure whether new liveness has to do it as well.

As far as I understand, this is important for one check in
propagate_liveness(). And that check means something like:
"if this register was read as 64-bit value, remember that
 it needs zero extension on 32-bit load".

Meaning that either DFA would need to track this bit of information
(should be simple), or more zero extensions would be added.

---

Repository [1] shared in cover letter was used for benchmarks below.
Abbreviations are as follows:
- Name: dfa-opts
  Commit: b73005452a4a
  Meaning: DFA as shared in this patch-set + a set of small
           improvements which I decided to exclude from the
           patch-set as described in the cover letter.
- Name: dfa-opts-no-rm
  Commit: e486757fdada
  Meaning: dfa-opts + read marks are disabled for registers.
- Name: dfa-opts-no-rm-sl
  Commit: a9930e8127a9
  Meaning: dfa-opts + read marks are disabled for registers
           and stack.

[1] https://github.com/eddyz87/bpf/tree/liveregs-dfa-std-liveregs-off

Veristat output is filtered using -f "states_pct>5" -f "!insns<200".
Veristat results are followed by a histogram that accounts for all
tests.

Two comparisons are made:
- dfa-opts vs dfa-opts-no-rm (small negative impact, except two
  sched_ext programs that hit 1M instructions limit; positive impact
  would have indicated a bug);
- dfa-opts vs dfa-opts-no-rm-sl (big negative impact).

========= selftests: dfa-opts vs dfa-opts-no-rm =========

File                      Program           States (A)  States (B)  States (DIFF)
------------------------  ----------------  ----------  ----------  -------------
test_l4lb_noinline.bpf.o  balancer_ingress         219         231   +12 (+5.48%)

Total progs: 3565
Old success: 2054
New success: 2054
States diff min:    0.00%
States diff max:    5.48%
   0% ..    5%: 3564
   5% ..   10%: 1

========= scx: dfa-opts vs dfa-opts-no-rm =========

File       Program          States (A)  States (B)  States      (DIFF)
---------  ---------------  ----------  ----------  ------------------
bpf.bpf.o  rusty_init             1944       55004  +53060 (+2729.42%)
bpf.bpf.o  rusty_init_task        1732       55049  +53317 (+3078.35%)

Total progs: 216
Old success: 186
New success: 184
States diff min:    0.00%
States diff max: 3078.35%
   0% ..    5%: 214
2725% .. 3080%: 2



========= selftests: dfa-opts vs dfa-opts-no-rm-sl =========

File                              Program                               States (A)  States (B)  States     (DIFF)
--------------------------------  ------------------------------------  ----------  ----------  -----------------
arena_htab_asm.bpf.o              arena_htab_asm                                33          40       +7 (+21.21%)
bpf_cubic.bpf.o                   bpf_cubic_cong_avoid                          92          98        +6 (+6.52%)
bpf_flow.bpf.o                    flow_dissector_0                              66         125      +59 (+89.39%)
bpf_iter_ksym.bpf.o               dump_ksym                                     16          21       +5 (+31.25%)
profiler1.bpf.o                   kprobe__proc_sys_write                        84         140      +56 (+66.67%)
profiler1.bpf.o                   kprobe__vfs_link                             504         543       +39 (+7.74%)
profiler1.bpf.o                   kprobe__vfs_symlink                          238         466     +228 (+95.80%)
profiler1.bpf.o                   kprobe_ret__do_filp_open                     247         274      +27 (+10.93%)
profiler1.bpf.o                   raw_tracepoint__sched_process_exec           139         350    +211 (+151.80%)
profiler1.bpf.o                   raw_tracepoint__sched_process_exit            67          86      +19 (+28.36%)
profiler1.bpf.o                   tracepoint__syscalls__sys_enter_kill         649         758     +109 (+16.80%)
profiler2.bpf.o                   kprobe__vfs_link                             149         257     +108 (+72.48%)
profiler2.bpf.o                   kprobe_ret__do_filp_open                     106         120      +14 (+13.21%)
profiler2.bpf.o                   raw_tracepoint__sched_process_exec           126         140      +14 (+11.11%)
profiler3.bpf.o                   kprobe__vfs_link                             805        1182     +377 (+46.83%)
pyperf180.bpf.o                   on_event                                   10564       17659    +7095 (+67.16%)
pyperf50.bpf.o                    on_event                                    2489        3375     +886 (+35.60%)
pyperf600_iter.bpf.o              on_event                                     192         214      +22 (+11.46%)
pyperf_subprogs.bpf.o             on_event                                    2331        2514      +183 (+7.85%)
setget_sockopt.bpf.o              skops_sockopt                                429         458       +29 (+6.76%)
setget_sockopt.bpf.o              socket_post_create                            90          95        +5 (+5.56%)
sock_iter_batch.bpf.o             iter_tcp_soreuse                               3           5       +2 (+66.67%)
strobemeta_bpf_loop.bpf.o         on_event                                     209         331     +122 (+58.37%)
test_bpf_nf.bpf.o                 nf_skb_ct_test                                41          56      +15 (+36.59%)
test_bpf_nf.bpf.o                 nf_xdp_ct_test                                41          56      +15 (+36.59%)
test_cls_redirect.bpf.o           cls_redirect                                2175       14083  +11908 (+547.49%)
test_cls_redirect_dynptr.bpf.o    cls_redirect                                 220         327     +107 (+48.64%)
test_cls_redirect_subprogs.bpf.o  cls_redirect                                4390       17001  +12611 (+287.27%)
test_l4lb.bpf.o                   balancer_ingress                             137         256     +119 (+86.86%)
test_l4lb_noinline.bpf.o          balancer_ingress                             219         643    +424 (+193.61%)
test_l4lb_noinline_dynptr.bpf.o   balancer_ingress                              73         182    +109 (+149.32%)
test_misc_tcp_hdr_options.bpf.o   misc_estab                                    88          98      +10 (+11.36%)
test_pkt_access.bpf.o             test_pkt_access                               21          25       +4 (+19.05%)
test_sock_fields.bpf.o            egress_read_sock_fields                       20          29       +9 (+45.00%)
test_tc_neigh_fib.bpf.o           tc_dst                                        12          14       +2 (+16.67%)
test_tc_neigh_fib.bpf.o           tc_src                                        12          14       +2 (+16.67%)
test_tcp_custom_syncookie.bpf.o   tcp_custom_syncookie                         420         560     +140 (+33.33%)
test_tcp_hdr_options.bpf.o        estab                                        189         225      +36 (+19.05%)
test_xdp.bpf.o                    _xdp_tx_iptunnel                              17          18        +1 (+5.88%)
test_xdp_dynptr.bpf.o             _xdp_tx_iptunnel                              26          36      +10 (+38.46%)
test_xdp_loop.bpf.o               _xdp_tx_iptunnel                              19          20        +1 (+5.26%)
test_xdp_noinline.bpf.o           balancer_ingress_v4                          271        1080    +809 (+298.52%)
test_xdp_noinline.bpf.o           balancer_ingress_v6                          268        1030    +762 (+284.33%)
xdp_features.bpf.o                xdp_do_tx                                     10          13       +3 (+30.00%)
xdp_synproxy_kern.bpf.o           syncookie_tc                                 390         467      +77 (+19.74%)
xdp_synproxy_kern.bpf.o           syncookie_xdp                                384         450      +66 (+17.19%)

Total progs: 3565
Old success: 2054
New success: 2054
States diff min:   -9.09%
States diff max:  547.49%
 -10% ..    0%: 3
   0% ..    5%: 3492
   5% ..   10%: 10
  10% ..   15%: 8
  15% ..   20%: 10
  20% ..   25%: 6
  25% ..   35%: 8
  35% ..   40%: 4
  45% ..   50%: 3
  50% ..   55%: 4
  55% ..   70%: 4
  70% ..   90%: 3
  95% ..  105%: 3
 145% ..  195%: 3
 280% ..  300%: 3
 545% ..  550%: 1

========= scx: dfa-opts vs dfa-opts-no-rm-sl =========

File            Program             States (A)  States (B)  States      (DIFF)
--------------  ------------------  ----------  ----------  ------------------
bpf.bpf.o       bpfland_enqueue             18          20        +2 (+11.11%)
bpf.bpf.o       bpfland_select_cpu          83         103       +20 (+24.10%)
bpf.bpf.o       flash_select_cpu            30          49       +19 (+63.33%)
bpf.bpf.o       lavd_cpu_offline           303         360       +57 (+18.81%)
bpf.bpf.o       lavd_cpu_online            303         360       +57 (+18.81%)
bpf.bpf.o       lavd_dispatch             7065       10652     +3587 (+50.77%)
bpf.bpf.o       lavd_init                  480         554       +74 (+15.42%)
bpf.bpf.o       lavd_running                89          94         +5 (+5.62%)
bpf.bpf.o       lavd_select_cpu            451         483        +32 (+7.10%)
bpf.bpf.o       layered_dispatch           501         950      +449 (+89.62%)
bpf.bpf.o       layered_dump               237         258        +21 (+8.86%)
bpf.bpf.o       layered_enqueue           1290        1655      +365 (+28.29%)
bpf.bpf.o       layered_init               423         552      +129 (+30.50%)
bpf.bpf.o       layered_select_cpu         201         311      +110 (+54.73%)
bpf.bpf.o       p2dq_dispatch               53         116      +63 (+118.87%)
bpf.bpf.o       rusty_init                1944       55006  +53062 (+2729.53%)
bpf.bpf.o       rusty_init_task           1732       55052  +53320 (+3078.52%)
bpf.bpf.o       rusty_running               19          23        +4 (+21.05%)
bpf.bpf.o       rusty_select_cpu           108         227     +119 (+110.19%)
bpf.bpf.o       rusty_set_cpumask          313         479      +166 (+53.04%)
scx_nest.bpf.o  nest_select_cpu             49          53         +4 (+8.16%)

Total progs: 216
Old success: 186
New success: 184
States diff min:    0.00%
States diff max: 3078.52%
   0% ..    5%: 186
   5% ..   10%: 4
  10% ..   15%: 5
  15% ..   20%: 6
  20% ..   25%: 3
  25% ..   55%: 6
  60% ..  115%: 3
 115% .. 3080%: 3






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux