On Thu, 2024-10-10 at 15:13 -0700, Andrii Nakryiko wrote: > On Wed, Oct 9, 2024 at 6:09 PM Alexei Starovoitov [...] > > Something should be done about: > > 71.25% [k] __mark_chain_precision > > 24.81% [k] bt_sync_linked_regs > > as well. > > The algorithm there needs some tweaks. > > If we were to store bpf_jmp_history_entry for each instruction (and we > can do that efficiently, memory-wise, I had the patch), and then for > each instruction we maintained a list of "input" regs/slots and > corresponding "output" regs/slots as we simulate each instruction > forward, I think __mark_chain_precision would be much simpler and thus > faster. We'd basically just walk backwards instruction by instruction, > check if any of the output regs/slots need to be precise (few bitmasks > intersection), and if yes, set all input regs/slots as "need > precision", and just continue forward. > > I think it's actually a simpler approach and should be faster. Simpler > because it's easy to tell inputs/outputs while doing forward > instruction processing. Faster because __mark_chain_precision would > only do very simple operation without lots of branching and checks. I think this would bring significant speedup. Not sure it would completely fix the issue at hand, as mark_chain_precision() walks like 100 instructions back on each iteration of the loop, but it might be a step in the right direction. Do you mind if I refresh your old patches for jump history, or do you want to work on this yourself? [...]