On Mon, Oct 30, 2023 at 10:03 PM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote: > > Now that precision backtracing is supporting register spill/fill to/from > stack, there is another oportunity to be exploited here: minimizing > precise STACK_ZERO cases. With a simple code change we can rely on > initially imprecise register spill tracking for cases when register > spilled to stack was a known zero. > > This is a very common case for initializing on the stack variables, > including rather large structures. Often times zero has no special > meaning for the subsequent BPF program logic and is often overwritten > with non-zero values soon afterwards. But due to STACK_ZERO vs > STACK_MISC tracking, such initial zero initialization actually causes > duplication of verifier states as STACK_ZERO is clearly different than > STACK_MISC or spilled SCALAR_VALUE register. > > The effect of this (now) trivial change is huge, as can be seen below. > These are differences between BPF selftests, Cilium, and Meta-internal > BPF object files relative to previous patch in this series. You can see > improvements ranging from single-digit percentage improvement for > instructions and states, all the way to 50-60% reduction for some of > Meta-internal host agent programs, and even some Cilium programs. > > For Meta-internal ones I left only the differences for largest BPF > object files by states/instructions, as there were too many differences > in the overall output. All the differences were improvements, reducting > number of states and thus instructions validated. > > Note, Meta-internal BPF object file names are not printed below. > Many copies of balancer_ingress are actually many different > configurations of Katran, so they are different BPF programs, which > explains state reduction going from -16% all the way to 31%, depending > on BPF program logic complexity. > > SELFTESTS > ========= > File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) > --------------------------------------- ----------------------- --------- --------- --------------- ---------- ---------- ------------- > bpf_iter_netlink.bpf.linked3.o dump_netlink 148 104 -44 (-29.73%) 8 5 -3 (-37.50%) > bpf_iter_unix.bpf.linked3.o dump_unix 8474 8404 -70 (-0.83%) 151 147 -4 (-2.65%) > bpf_loop.bpf.linked3.o stack_check 560 324 -236 (-42.14%) 42 24 -18 (-42.86%) > local_storage_bench.bpf.linked3.o get_local 120 77 -43 (-35.83%) 9 6 -3 (-33.33%) > loop6.bpf.linked3.o trace_virtqueue_add_sgs 10167 9868 -299 (-2.94%) 226 206 -20 (-8.85%) > pyperf600_bpf_loop.bpf.linked3.o on_event 4872 3423 -1449 (-29.74%) 322 229 -93 (-28.88%) > strobemeta.bpf.linked3.o on_event 180697 176036 -4661 (-2.58%) 4780 4734 -46 (-0.96%) > test_cls_redirect.bpf.linked3.o cls_redirect 65594 65401 -193 (-0.29%) 4230 4212 -18 (-0.43%) > test_global_func_args.bpf.linked3.o test_cls 145 136 -9 (-6.21%) 10 9 -1 (-10.00%) > test_l4lb.bpf.linked3.o balancer_ingress 4760 2612 -2148 (-45.13%) 113 102 -11 (-9.73%) > test_l4lb_noinline.bpf.linked3.o balancer_ingress 4845 4877 +32 (+0.66%) 219 221 +2 (+0.91%) > test_l4lb_noinline_dynptr.bpf.linked3.o balancer_ingress 2072 2087 +15 (+0.72%) 97 98 +1 (+1.03%) > test_seg6_loop.bpf.linked3.o __add_egr_x 12440 9975 -2465 (-19.82%) 364 353 -11 (-3.02%) > test_tcp_hdr_options.bpf.linked3.o estab 2558 2572 +14 (+0.55%) 179 180 +1 (+0.56%) > test_xdp_dynptr.bpf.linked3.o _xdp_tx_iptunnel 645 596 -49 (-7.60%) 26 24 -2 (-7.69%) > test_xdp_noinline.bpf.linked3.o balancer_ingress_v6 3520 3516 -4 (-0.11%) 216 216 +0 (+0.00%) > xdp_synproxy_kern.bpf.linked3.o syncookie_tc 82661 81241 -1420 (-1.72%) 5073 5155 +82 (+1.62%) > xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 84964 82297 -2667 (-3.14%) 5130 5157 +27 (+0.53%) > > META-INTERNAL > ============= > Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) > -------------------------------------- --------- --------- ----------------- ---------- ---------- --------------- > balancer_ingress 27925 23608 -4317 (-15.46%) 1488 1482 -6 (-0.40%) > balancer_ingress 31824 27546 -4278 (-13.44%) 1658 1652 -6 (-0.36%) > balancer_ingress 32213 27935 -4278 (-13.28%) 1689 1683 -6 (-0.36%) > balancer_ingress 32213 27935 -4278 (-13.28%) 1689 1683 -6 (-0.36%) > balancer_ingress 31824 27546 -4278 (-13.44%) 1658 1652 -6 (-0.36%) > balancer_ingress 38647 29562 -9085 (-23.51%) 2069 1835 -234 (-11.31%) > balancer_ingress 38647 29562 -9085 (-23.51%) 2069 1835 -234 (-11.31%) > balancer_ingress 40339 30792 -9547 (-23.67%) 2193 1934 -259 (-11.81%) > balancer_ingress 37321 29055 -8266 (-22.15%) 1972 1795 -177 (-8.98%) > balancer_ingress 38176 29753 -8423 (-22.06%) 2008 1831 -177 (-8.81%) > balancer_ingress 29193 20910 -8283 (-28.37%) 1599 1422 -177 (-11.07%) > balancer_ingress 30013 21452 -8561 (-28.52%) 1645 1447 -198 (-12.04%) > balancer_ingress 28691 24290 -4401 (-15.34%) 1545 1531 -14 (-0.91%) > balancer_ingress 34223 28965 -5258 (-15.36%) 1984 1875 -109 (-5.49%) > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%) > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%) > balancer_ingress 35868 26455 -9413 (-26.24%) 2140 1827 -313 (-14.63%) > balancer_ingress 35868 26455 -9413 (-26.24%) 2140 1827 -313 (-14.63%) > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%) > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%) > balancer_ingress 34844 29485 -5359 (-15.38%) 2036 1918 -118 (-5.80%) > fbflow_egress 3256 2652 -604 (-18.55%) 218 192 -26 (-11.93%) > fbflow_ingress 1026 944 -82 (-7.99%) 70 63 -7 (-10.00%) > sslwall_tc_egress 8424 7360 -1064 (-12.63%) 498 458 -40 (-8.03%) > syar_accept_protect 15040 9539 -5501 (-36.58%) 364 220 -144 (-39.56%) > syar_connect_tcp_v6 15036 9535 -5501 (-36.59%) 360 216 -144 (-40.00%) > syar_connect_udp_v4 15039 9538 -5501 (-36.58%) 361 217 -144 (-39.89%) > syar_connect_connect4_protect4 24805 15833 -8972 (-36.17%) 756 480 -276 (-36.51%) > syar_lsm_file_open 167772 151813 -15959 (-9.51%) 1836 1667 -169 (-9.20%) > syar_namespace_create_new 14805 9304 -5501 (-37.16%) 353 209 -144 (-40.79%) > syar_python3_detect 17531 12030 -5501 (-31.38%) 391 247 -144 (-36.83%) > syar_ssh_post_fork 16412 10911 -5501 (-33.52%) 405 261 -144 (-35.56%) > syar_enter_execve 14728 9227 -5501 (-37.35%) 345 201 -144 (-41.74%) > syar_enter_execveat 14728 9227 -5501 (-37.35%) 345 201 -144 (-41.74%) > syar_exit_execve 16622 11121 -5501 (-33.09%) 376 232 -144 (-38.30%) > syar_exit_execveat 16622 11121 -5501 (-33.09%) 376 232 -144 (-38.30%) > syar_syscalls_kill 15288 9787 -5501 (-35.98%) 398 254 -144 (-36.18%) > syar_task_enter_pivot_root 14898 9397 -5501 (-36.92%) 357 213 -144 (-40.34%) > syar_syscalls_setreuid 16678 11177 -5501 (-32.98%) 429 285 -144 (-33.57%) > syar_syscalls_setuid 16678 11177 -5501 (-32.98%) 429 285 -144 (-33.57%) > syar_syscalls_process_vm_readv 14959 9458 -5501 (-36.77%) 364 220 -144 (-39.56%) > syar_syscalls_process_vm_writev 15757 10256 -5501 (-34.91%) 390 246 -144 (-36.92%) > do_uprobe 15519 10018 -5501 (-35.45%) 373 229 -144 (-38.61%) > edgewall 179715 55783 -123932 (-68.96%) 12607 3999 -8608 (-68.28%) > bictcp_state 7570 4131 -3439 (-45.43%) 496 269 -227 (-45.77%) > cubictcp_state 7570 4131 -3439 (-45.43%) 496 269 -227 (-45.77%) > tcp_rate_skb_delivered 447 272 -175 (-39.15%) 29 18 -11 (-37.93%) > kprobe__bbr_set_state 4566 2615 -1951 (-42.73%) 209 124 -85 (-40.67%) > kprobe__bictcp_state 4566 2615 -1951 (-42.73%) 209 124 -85 (-40.67%) > inet_sock_set_state 1501 1337 -164 (-10.93%) 93 85 -8 (-8.60%) > tcp_retransmit_skb 1145 981 -164 (-14.32%) 67 59 -8 (-11.94%) > tcp_retransmit_synack 1183 951 -232 (-19.61%) 67 55 -12 (-17.91%) > bpf_tcptuner 1459 1187 -272 (-18.64%) 99 80 -19 (-19.19%) > tw_egress 801 776 -25 (-3.12%) 69 66 -3 (-4.35%) > tw_ingress 795 770 -25 (-3.14%) 69 66 -3 (-4.35%) > ttls_tc_ingress 19025 19383 +358 (+1.88%) 470 465 -5 (-1.06%) > ttls_nat_egress 490 299 -191 (-38.98%) 33 20 -13 (-39.39%) > ttls_nat_ingress 448 285 -163 (-36.38%) 32 21 -11 (-34.38%) > tw_twfw_egress 511127 212071 -299056 (-58.51%) 16733 8504 -8229 (-49.18%) > tw_twfw_ingress 500095 212069 -288026 (-57.59%) 16223 8504 -7719 (-47.58%) > tw_twfw_tc_eg 511113 212064 -299049 (-58.51%) 16732 8504 -8228 (-49.18%) > tw_twfw_tc_in 500095 212069 -288026 (-57.59%) 16223 8504 -7719 (-47.58%) > tw_twfw_egress 12632 12435 -197 (-1.56%) 276 260 -16 (-5.80%) > tw_twfw_ingress 12631 12454 -177 (-1.40%) 278 261 -17 (-6.12%) > tw_twfw_tc_eg 12595 12435 -160 (-1.27%) 274 259 -15 (-5.47%) > tw_twfw_tc_in 12631 12454 -177 (-1.40%) 278 261 -17 (-6.12%) > tw_xdp_dump 266 209 -57 (-21.43%) 9 8 -1 (-11.11%) > > CILIUM > ========= > File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) > ------------- -------------------------------- --------- --------- ---------------- ---------- ---------- -------------- > bpf_host.o cil_to_netdev 6047 4578 -1469 (-24.29%) 362 249 -113 (-31.22%) > bpf_host.o handle_lxc_traffic 2227 1585 -642 (-28.83%) 156 103 -53 (-33.97%) > bpf_host.o tail_handle_ipv4_from_netdev 2244 1458 -786 (-35.03%) 163 106 -57 (-34.97%) > bpf_host.o tail_handle_nat_fwd_ipv4 21022 10479 -10543 (-50.15%) 1289 670 -619 (-48.02%) > bpf_host.o tail_handle_nat_fwd_ipv6 15433 11375 -4058 (-26.29%) 905 643 -262 (-28.95%) > bpf_host.o tail_ipv4_host_policy_ingress 2219 1367 -852 (-38.40%) 161 96 -65 (-40.37%) > bpf_host.o tail_nodeport_nat_egress_ipv4 22460 19862 -2598 (-11.57%) 1469 1293 -176 (-11.98%) > bpf_host.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%) > bpf_host.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%) > bpf_host.o tail_nodeport_nat_ipv6_egress 3702 3542 -160 (-4.32%) 215 205 -10 (-4.65%) > bpf_lxc.o tail_handle_nat_fwd_ipv4 21022 10479 -10543 (-50.15%) 1289 670 -619 (-48.02%) > bpf_lxc.o tail_handle_nat_fwd_ipv6 15433 11375 -4058 (-26.29%) 905 643 -262 (-28.95%) > bpf_lxc.o tail_ipv4_ct_egress 5073 3374 -1699 (-33.49%) 262 172 -90 (-34.35%) > bpf_lxc.o tail_ipv4_ct_ingress 5093 3385 -1708 (-33.54%) 262 172 -90 (-34.35%) > bpf_lxc.o tail_ipv4_ct_ingress_policy_only 5093 3385 -1708 (-33.54%) 262 172 -90 (-34.35%) > bpf_lxc.o tail_ipv6_ct_egress 4593 3878 -715 (-15.57%) 194 151 -43 (-22.16%) > bpf_lxc.o tail_ipv6_ct_ingress 4606 3891 -715 (-15.52%) 194 151 -43 (-22.16%) > bpf_lxc.o tail_ipv6_ct_ingress_policy_only 4606 3891 -715 (-15.52%) 194 151 -43 (-22.16%) > bpf_lxc.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%) > bpf_lxc.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%) > bpf_overlay.o tail_handle_nat_fwd_ipv4 20524 10114 -10410 (-50.72%) 1271 638 -633 (-49.80%) > bpf_overlay.o tail_nodeport_nat_egress_ipv4 22718 19490 -3228 (-14.21%) 1475 1275 -200 (-13.56%) > bpf_overlay.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%) > bpf_overlay.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%) > bpf_overlay.o tail_nodeport_nat_ipv6_egress 3638 3548 -90 (-2.47%) 209 203 -6 (-2.87%) > bpf_overlay.o tail_rev_nodeport_lb4 4368 3820 -548 (-12.55%) 248 215 -33 (-13.31%) > bpf_overlay.o tail_rev_nodeport_lb6 2867 2428 -439 (-15.31%) 167 140 -27 (-16.17%) > bpf_sock.o cil_sock6_connect 1718 1703 -15 (-0.87%) 100 99 -1 (-1.00%) > bpf_xdp.o tail_handle_nat_fwd_ipv4 12917 12443 -474 (-3.67%) 875 849 -26 (-2.97%) > bpf_xdp.o tail_handle_nat_fwd_ipv6 13515 13264 -251 (-1.86%) 715 702 -13 (-1.82%) > bpf_xdp.o tail_lb_ipv4 39492 36367 -3125 (-7.91%) 2430 2251 -179 (-7.37%) > bpf_xdp.o tail_lb_ipv6 80441 78058 -2383 (-2.96%) 3647 3523 -124 (-3.40%) > bpf_xdp.o tail_nodeport_ipv6_dsr 1038 901 -137 (-13.20%) 61 55 -6 (-9.84%) > bpf_xdp.o tail_nodeport_nat_egress_ipv4 13027 12096 -931 (-7.15%) 868 809 -59 (-6.80%) > bpf_xdp.o tail_nodeport_nat_ingress_ipv4 7617 5900 -1717 (-22.54%) 522 413 -109 (-20.88%) > bpf_xdp.o tail_nodeport_nat_ingress_ipv6 7575 7395 -180 (-2.38%) 383 374 -9 (-2.35%) > bpf_xdp.o tail_rev_nodeport_lb4 6808 6739 -69 (-1.01%) 403 396 -7 (-1.74%) > bpf_xdp.o tail_rev_nodeport_lb6 16173 15847 -326 (-2.02%) 1010 990 -20 (-1.98%) > So I also want to mention that while I did spot check a few programs (not the biggest ones) and they did seem to have correct verification flow, I obviously can't easily validate verifier log_level=2 logs for all of the changes above, especially those multi-thousand state programs. I'd really appreciate someone from Isovalent/Cilium to do some checking of the Cilium program or two for sanity, just in case. Thanks! > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > --- > kernel/bpf/verifier.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 8cfe060e4938..e42ce974b106 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -4668,8 +4668,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, > return err; > > mark_stack_slot_scratched(env, spi); > - if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && > - !register_is_null(reg) && env->bpf_capable) { > + if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && env->bpf_capable) { > save_register_state(env, state, spi, reg, size); > /* Break the relation on a narrowing spill. */ > if (fls64(reg->umax_value) > BITS_PER_BYTE * size) > @@ -4718,7 +4717,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, > /* when we zero initialize stack slots mark them as such */ > if ((reg && register_is_null(reg)) || > (!reg && is_bpf_st_mem(insn) && insn->imm == 0)) { > - /* backtracking doesn't work for STACK_ZERO yet. */ > + /* STACK_ZERO case happened because register spill > + * wasn't properly aligned at the stack slot boundary, > + * so it's not a register spill anymore; force > + * originating register to be precise to make > + * STACK_ZERO correct for subsequent states > + */ > err = mark_chain_precision(env, value_regno); > if (err) > return err; > -- > 2.34.1 >