On Sat, 2025-03-01 at 16:09 -0800, Alexei Starovoitov wrote: > On Fri, Feb 28, 2025 at 8:40 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: [...] > > Complete removal of mark_reg_read() means that analysis needs to be > > done for stack slots as well. The algorithm to handle stack slots is > > much more complicated: > > - it needs to track register / stack slot type to handle cases like > > "r1 = r10" and spills of the stack pointer to stack; > > - it needs to track register values, at-least crudely, to handle cases > > like "r1 = r10; r1 += r2;" (array access). > > Doing this kind of register movement tracking before do_check() > may be difficult indeed. > Can we do this use/def tracking inline similar to current liveness, > but without ->parent logic. > Using postorder array that this patch adds ? > verifier states are path sensitive and more accurate > while this one will be insn based, but maybe good enough ? You mean act like precision tracking? Whenever instruction is verified and use is recorded propagate this use upwards in execution path, updating live-in/live-out sets for instructions? The problem here is termination (when to consider live-in/live-out sets final). DFA computation stops as soon as live-in/live-out marks stop changing. Idk how this condition should look for the scheme above. [...] > > > Also note that mark_reg_read() tracks 32 vs 64 reads separately. > > > iirc we did it to support fine grain mark_insn_zext > > > to help architectures where zext has to be inserted by JIT. > > > I'm not sure whether new liveness has to do it as well. > > > > As far as I understand, this is important for one check in > > propagate_liveness(). And that check means something like: > > "if this register was read as 64-bit value, remember that > > it needs zero extension on 32-bit load". > > > > Meaning that either DFA would need to track this bit of information > > (should be simple), or more zero extensions would be added. > > Right. New liveness doesn't break zext, but makes it worse > for arch that needs it. We would need to quantify the impact. > iirc it was noticeable enough that we added this support. I'm surprised that no test_progs or test_verifier tests a broken. Agree that this needs to be quantified. [...] > > Two comparisons are made: > > - dfa-opts vs dfa-opts-no-rm (small negative impact, except two > > sched_ext programs that hit 1M instructions limit; positive impact > > would have indicated a bug); > > Let's figure out what is causing rusty_init[_task] > to explode. > And proceed with this set in parallel. Will do. > > - dfa-opts vs dfa-opts-no-rm-sl (big negative impact). > > I don't read it as a big negative. > cls_redirect and balancer_ingress need to be understood, > but nothing exploded to hit 1M insns, > so hopefully bare minimum stack tracking would do the trick. > > So in terms of priorities, let's land this set, then > figure out rusty_init, > figure out read32 vs 64 for zext, > at that time we may land -no-rm. > Then stack tracking. Tbh, I think that landing dfa-opts-no-rm separately from dfa-opts-no-rm-sl doesn't make things much simpler. The register chain based liveness computation would still be a thing. I'd try to figure out how to make the dfa-opts-no-rm-sl variant faster first.