On Wed, Nov 6, 2024 at 1:12 AM Liao, Chang <liaochang1@xxxxxxxxxx> wrote: > > > > 在 2024/9/27 17:45, Liao Chang 写道: > >> 2 files changed, 139 insertions(+), 42 deletions(-) > >> > > Liao, > > > > Assuming your ARM64 improvements go through, would you still need > > these changes? XOL case is a slow case and if possible should be > > avoided at all costs. If all common cases for ARM64 are covered > > through instruction emulation, would we need to add all this > > complexity to optimize slow case? > > Andrii, > > I've studied the optimizations merged over the past month, it seems > that part of the problem addressed in this patch has been resolved > by Oleg(uprobes: kill xol_area->slot_count). And I hope you've received > the email with the re-run results for -push using simulated STP on > the latest kernel (tag next-20241104). It show significant improvements, > althought there's still room to match the throughput of -nop and -ret. > So based on these results, I would prioritize the STP simulation patch. Great, I was hoping that Oleg's patches would help. And yes, I absolutely agree, STP simulation to avoid kernel->user->kernel switch is probably the biggest bang for the buck for ARM64 specifically now. Can you please send a fastest simulation approach that works like x86-64, and we can try to continue conversation on the refreshed patch? > > -- > BR > Liao, Chang >