Re: [PATCH v6] arm64: implement ftrace with regs

Torsten Duwe <duwe@xxxxxx> · Fri, 4 Jan 2019 23:41:45 +0100

On Fri, Jan 04, 2019 at 01:06:48PM -0500, Steven Rostedt wrote:
> On Fri, 4 Jan 2019 17:50:18 +0000
> Mark Rutland <mark.rutland@xxxxxxx> wrote:
> 
> > At Linux Plumbers, I had a conversation with Steve Rostedt, and we came
> > to the conclusion that (withut heavyweight synchronization) patching two
> > NOPs at runtime isn't safe, since a CPU might have executed the first
> > NOP as a NOP before another CPU patches both instructions. So a CPU
> > might execute:
> > 
> > 	NOP
> > 	BL	ftrace_regs_caller
> > 
> > ... rather than the expected:
> > 
> > 	MOV	X9, X30
> > 	BL	ftrace_regs_caller
> > 
> > ... and therefore X9 contains some UNKNOWN value, rather than the
> > original LR value.

I'm perfectly aware of that; an earlier version had barriers, attempting
to avoid just that, which Mark(?) wrote weren't neccessary.

But is this a realistic scenario? All function entries are aligned 8 bytes.
Are there arm64 implementations out there that fetch only 4 bytes and
give a chance to mess with the 2nd 4 bytes? You at arm.com should know, and
I won't be surprised if the answer is a weird "yes". Or maybe it's just
another erratum lurking somewhere...

My point is: those 2 insn will _never_ be split by any alignment
boundary > 8; does that mean anything, have you considered this?

> > I wonder if we could solve that by patching the kernel at build-time, to
> > add the MOV X9, X30 in place of the first NOP. If we were to do that, we
> > could also update the addresses to pooint at the second NOP, simplifying
> > the changes to the runtime code.
> 
> You can also patch it at boot up when there's only one CPU running, and
> interrupts are disabled.

May I remind about possible performance hits? Even the NOPs had a tiny impact
on certain in-order implementations. I'd rather switch between the mov and
a "b +2".

	Torsten