On Thu, Oct 10, 2024 at 3:40 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Thu, 2024-10-10 at 15:13 -0700, Andrii Nakryiko wrote: > > On Wed, Oct 9, 2024 at 6:09 PM Alexei Starovoitov > > [...] > > > > Something should be done about: > > > 71.25% [k] __mark_chain_precision > > > 24.81% [k] bt_sync_linked_regs > > > as well. > > > The algorithm there needs some tweaks. > > > > If we were to store bpf_jmp_history_entry for each instruction (and we > > can do that efficiently, memory-wise, I had the patch), and then for > > each instruction we maintained a list of "input" regs/slots and > > corresponding "output" regs/slots as we simulate each instruction > > forward, I think __mark_chain_precision would be much simpler and thus > > faster. We'd basically just walk backwards instruction by instruction, > > check if any of the output regs/slots need to be precise (few bitmasks > > intersection), and if yes, set all input regs/slots as "need > > precision", and just continue forward. > > > > I think it's actually a simpler approach and should be faster. Simpler > > because it's easy to tell inputs/outputs while doing forward > > instruction processing. Faster because __mark_chain_precision would > > only do very simple operation without lots of branching and checks. > > I think this would bring significant speedup. > Not sure it would completely fix the issue at hand, > as mark_chain_precision() walks like 100 instructions back on each > iteration of the loop, but it might be a step in the right direction. > > Do you mind if I refresh your old patches for jump history, > or do you want to work on this yourself? No, I don't mind at all. Go ahead. > > [...]