On Fri, Jan 04, 2019 at 01:06:48PM -0500, Steven Rostedt wrote: > On Fri, 4 Jan 2019 17:50:18 +0000 > Mark Rutland <mark.rutland@xxxxxxx> wrote: > > > At Linux Plumbers, I had a conversation with Steve Rostedt, and we came > > to the conclusion that (withut heavyweight synchronization) patching two > > NOPs at runtime isn't safe, since a CPU might have executed the first > > NOP as a NOP before another CPU patches both instructions. So a CPU > > might execute: > > > > NOP > > BL ftrace_regs_caller > > > > ... rather than the expected: > > > > MOV X9, X30 > > BL ftrace_regs_caller > > > > ... and therefore X9 contains some UNKNOWN value, rather than the > > original LR value. I'm perfectly aware of that; an earlier version had barriers, attempting to avoid just that, which Mark(?) wrote weren't neccessary. But is this a realistic scenario? All function entries are aligned 8 bytes. Are there arm64 implementations out there that fetch only 4 bytes and give a chance to mess with the 2nd 4 bytes? You at arm.com should know, and I won't be surprised if the answer is a weird "yes". Or maybe it's just another erratum lurking somewhere... My point is: those 2 insn will _never_ be split by any alignment boundary > 8; does that mean anything, have you considered this? > > I wonder if we could solve that by patching the kernel at build-time, to > > add the MOV X9, X30 in place of the first NOP. If we were to do that, we > > could also update the addresses to pooint at the second NOP, simplifying > > the changes to the runtime code. > > You can also patch it at boot up when there's only one CPU running, and > interrupts are disabled. May I remind about possible performance hits? Even the NOPs had a tiny impact on certain in-order implementations. I'd rather switch between the mov and a "b +2". Torsten