Ok, 7 days has passed, let's see how we are doing here... On Wed, Oct 16, 2024 at 12:35 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > Hello, > > I wanted to provide a bit of a context about and tie together a few > separate work streams (across a few separate kernel trees) all > revolving around uprobe improvements, as there are a bunch of them and > I'm sure it's hard to keep track of all of them. And hopefully I can > also get Peter and ARM maintainer's input on some specific questions I > asked below. Thank you in advance! > > In short, in the last few months there was a high activity around > fixing and improving uprobes. All this is the result of increased and > more varied use of uprobes/uretprobe in production settings. Uprobe > performance is **very** important, and yes, we do have real use cases > that go to millions per second uprobe/uretprobe triggering throughput, > unfortunately. So any small bit of performance and scalability > improvement is helpful. No, this isn't just some nerdy perf > optimization work (I've been asked this a few times, so I thought I'd > emphasize this again). > > So, we've already landed a bunch of work, mainly (not an exhaustive list): > > - various clean ups, API improvements, and bug fixes from Oleg > Nesterov ([0], [1]). This simplified internal APIs and was a > prerequisite of the rest of the work; > - changes to refcounting and RCU-ifying of uprobe lifetime from me > ([2]). This improved single-threaded performance somewhat, but mainly > significantly improved scalability in the presence of multiple CPUs > triggering lots of uprobes; > - ARM64-specific optimization of uprobe emulation of NOP instruction > by Liao Chang ([3]). This change alone gives 2x (!) speed up for a > USDT tracing use cases *on ARM64* (we already have this optimization > in x86-64); > - there was a bit earlier work by Jiri Olsa ([4]) to add uretprobe() > syscall, giving +30% speed ups. > > And there are a few more outstanding changes: > > - Jiri Olsa's uprobe "session" support ([5]). This is less > performance focused, but important functionality by itself. But I'm > calling this out here because the first two patches are pure uprobe > internal changes, and I believe they should go into tip/perf/core to > avoid conflicts with the rest of pending uprobe changes. > > Peter, do you mind applying those two and creating a stable tag for > bpf-next to pull? We'll apply the rest of Jiri's series to > bpf-next/master. Jiri has reposted patches this time CC'ing Peter, heh :), it would be great to apply those two patches and get a stable tag. This is blocking the landing of uprobe sessions in bpf-next and also my remaining patches will be based on top of Jiri's uprobe changes, most probably. Peter, please take another look, thank you. > > - Liao Chang's ARM64-specific STP instruction emulation support > ([6]). This one will give 2x (!) improvement for a common case of > having STP instruction being a first instruction in traced user > function (similar to NOP for USDTs). > > ARM64 maintainers (cc'ed Catalin, Will, and Mark), can you guys please > take another look? This one was a bit more controversial, but > hopefully there is a way to massage it to be acceptable and not > introduce unnecessary slowdowns (there were some concerns about memory > ordering/visibility, which hopefully don't apply to uprobe cases). > It's an important improvement, I'd really appreciate it if we can make > progress here, thank you! > Ping. ARM64 folks, can you please take a look and reply? Thank you. > - my speculative VMA-to-uprobe lookup series ([7]). This makes entry > uprobe scalability scale linearly with the number of CPUs (the > ultimate goal of uprobe scalability work). > > I think it's ready to go in. It has **implicit** dependency on > Christian Brauner's recent change for FMODE_BACKING, for which he > provided a stable tag. Peter, do you have any remaining concerns or > this can be also merged soon? No changes, still ready to go in. Might need a rebase if Jiri's patches are applied. > > - another patch set of mine, switching uretprobe fast path to SRCU > (with timeout) ([8]). This makes return uprobes (uretprobes) linearly > scalable in the common case (again, the ultimate scalability goal). > > I haven't gotten much feedback here, would love to get some objective > review here. This is an important counterpart to the speculative > VMA-to-uprobe lookup series. Both are needed in practice. > The only thing that has progressed, thank you. I'll apply suggested state changes, but I intend to postpone delayed_uprobe_lock rework to a separate follow up patch set. Just a heads up. > - patch set dropping unnecessary siglock usage in uprobe by Liao > Chang ([9]). This one removes yet another lock, for a less common case > (at least on x86-64) of single-stepped uprobe (where the probed > instruction can't be emulated). > > This one needs a rebase, but it was already acked by Oleg. Liao, > please prioritize the rebase and send v4 ASAP, so this is not lost. > This was rebased and acked by Masami. Seems to be ready to be applied. > > As you can see, lots of stuff needs to be landed and most of it is in > good shape already. I'd love to hear thoughts of relevant people > called out above, thank you! > > > [0] https://lore.kernel.org/linux-trace-kernel/20240729134444.GA12293@xxxxxxxxxx/ > [1] https://lore.kernel.org/linux-trace-kernel/20240929144201.GA9429@xxxxxxxxxx/ > [2] https://lore.kernel.org/linux-trace-kernel/20240903174603.3554182-1-andrii@xxxxxxxxxx/ > [3] https://lore.kernel.org/linux-trace-kernel/20240909071114.1150053-1-liaochang1@xxxxxxxxxx/ > [4] https://lore.kernel.org/linux-trace-kernel/20240523121149.575616-1-jolsa@xxxxxxxxxx/ > [5] https://lore.kernel.org/bpf/20241015091050.3731669-1-jolsa@xxxxxxxxxx/ > [6] https://lore.kernel.org/linux-trace-kernel/20240910060407.1427716-1-liaochang1@xxxxxxxxxx/ > [7] https://lore.kernel.org/linux-trace-kernel/20241010205644.3831427-1-andrii@xxxxxxxxxx/ > [8] https://lore.kernel.org/linux-trace-kernel/20241008002556.2332835-1-andrii@xxxxxxxxxx/ > [9] https://lore.kernel.org/linux-trace-kernel/20240815014629.2685155-1-liaochang1@xxxxxxxxxx/ > > -- Andrii