From: Jiri Olsa > Sent: 11 December 2024 13:34 > > Putting together all the previously added pieces to support optimized > uprobes on top of 5-byte nop instruction. > > The current uprobe execution goes through following: > - installs breakpoint instruction over original instruction > - exception handler hit and calls related uprobe consumers > - and either simulates original instruction or does out of line single step > execution of it > - returns to user space > > The optimized uprobe path > > - checks the original instruction is 5-byte nop (plus other checks) > - adds (or uses existing) user space trampoline and overwrites original > instruction (5-byte nop) with call to user space trampoline > - the user space trampoline executes uprobe syscall that calls related uprobe > consumers > - trampoline returns back to next instruction ... How on earth can you safely overwrite a randomly aligned 5 byte instruction that might be being prefetched and executed by another thread of the same process. If the instruction doesn't cross a cache line boundary then you might manage to convince people that an 8-byte write will always be atomic wrt other cpu reading instructions. But you can't guarantee the alignment. You might manage with the 7 byte sequence: br .+7; call addr and then update 'addr' before changing the branch offset from 05 to 00. But even that may not be safe if 'addr' crosses a cache line boundary. You could replace a one byte nop (0x90) with a breakpoint (0xcc) and then return to the instruction after the breakpoint. That would save having to emulate or single stap the overwritten instruction. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)