On Thu, Nov 21, 2024 at 8:34 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Thu, Nov 21, 2024 at 08:02:12AM -0800, Alexei Starovoitov wrote: > > On Thu, Nov 21, 2024 at 4:17 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > > > On Wed, Nov 20, 2024 at 04:07:38PM -0800, Andrii Nakryiko wrote: > > > > > > > USDTs are meant to be "transparent" to the surrounding code and they > > > > don't mark any clobbered registers. Technically it could be added, but > > > > I'm not a fan of this. > > > > > > Sure. Anyway, another thing to consider is FRED, will all of this still > > > matter once that lands? If FRED gets us INT3 performance close to what > > > SYSCALL has, then all this work will go unused. > > > > afaik not a single cpu in the datacenter supports FRED while > > uprobe overhead is real. > > imo it's worth improving performance today for existing cpus. > > I understand, but OTOH adding a syscall now, that we'll have to maintain > for years and years, even through we know it'll not be used much is a > bit annoying. No. It _will_ be used for years. > > > I suspect arm64 might benefit too. Even if arm hw does the same > > amount of work for trap vs syscall the sw overhead of handling > > trap is different. > > Well, the RISC CPUs have a much harder time using this, their immediate > range is typically puny and they end up needing multiple instructions > and some register in order to set up a call. We don't care about 32-bit archs and other exotics. They're not the reasons to leave performance on the table on dominant archs. > Elsewhere in the thread Mark Rutland already noted that arm64 really > doesn't need or want this. Doesn't look like you've read what you quoted above. On arm64 the _HW_ cost may be the same. The _SW_ difference in handling trap vs syscall is real. I bet once uprobe syscall is benchmarked on arm64 there will be a delta.