On Sat, Feb 2, 2019 at 12:14 PM Balbir Singh <bsingharora@xxxxxxxxx> wrote: > > On Tue, Jan 22, 2019 at 10:57:21AM -0500, Joe Lawrence wrote: > > From: Nicolai Stange <nstange@xxxxxxx> > > > > The ppc64 specific implementation of the reliable stacktracer, > > save_stack_trace_tsk_reliable(), bails out and reports an "unreliable > > trace" whenever it finds an exception frame on the stack. Stack frames > > are classified as exception frames if the STACK_FRAME_REGS_MARKER magic, > > as written by exception prologues, is found at a particular location. > > > > However, as observed by Joe Lawrence, it is possible in practice that > > non-exception stack frames can alias with prior exception frames and thus, > > that the reliable stacktracer can find a stale STACK_FRAME_REGS_MARKER on > > the stack. It in turn falsely reports an unreliable stacktrace and blocks > > any live patching transition to finish. Said condition lasts until the > > stack frame is overwritten/initialized by function call or other means. > > > > In principle, we could mitigate this by making the exception frame > > classification condition in save_stack_trace_tsk_reliable() stronger: > > in addition to testing for STACK_FRAME_REGS_MARKER, we could also take into > > account that for all exceptions executing on the kernel stack > > - their stack frames's backlink pointers always match what is saved > > in their pt_regs instance's ->gpr[1] slot and that > > - their exception frame size equals STACK_INT_FRAME_SIZE, a value > > uncommonly large for non-exception frames. > > > > However, while these are currently true, relying on them would make the > > reliable stacktrace implementation more sensitive towards future changes in > > the exception entry code. Note that false negatives, i.e. not detecting > > exception frames, would silently break the live patching consistency model. > > > > Furthermore, certain other places (diagnostic stacktraces, perf, xmon) > > rely on STACK_FRAME_REGS_MARKER as well. > > > > Make the exception exit code clear the on-stack STACK_FRAME_REGS_MARKER > > for those exceptions running on the "normal" kernel stack and returning > > to kernelspace: because the topmost frame is ignored by the reliable stack > > tracer anyway, returns to userspace don't need to take care of clearing > > the marker. > > > > Furthermore, as I don't have the ability to test this on Book 3E or > > 32 bits, limit the change to Book 3S and 64 bits. > > > > Finally, make the HAVE_RELIABLE_STACKTRACE Kconfig option depend on > > PPC_BOOK3S_64 for documentation purposes. Before this patch, it depended > > on PPC64 && CPU_LITTLE_ENDIAN and because CPU_LITTLE_ENDIAN implies > > PPC_BOOK3S_64, there's no functional change here. > > > > Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model") > > Reported-by: Joe Lawrence <joe.lawrence@xxxxxxxxxx> > > Signed-off-by: Nicolai Stange <nstange@xxxxxxx> > > Signed-off-by: Joe Lawrence <joe.lawrence@xxxxxxxxxx> > > --- > > arch/powerpc/Kconfig | 2 +- > > arch/powerpc/kernel/entry_64.S | 7 +++++++ > > 2 files changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > > index 2890d36eb531..73bf87b1d274 100644 > > --- a/arch/powerpc/Kconfig > > +++ b/arch/powerpc/Kconfig > > @@ -220,7 +220,7 @@ config PPC > > select HAVE_PERF_USER_STACK_DUMP > > select HAVE_RCU_TABLE_FREE if SMP > > select HAVE_REGS_AND_STACK_ACCESS_API > > - select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN > > + select HAVE_RELIABLE_STACKTRACE if PPC_BOOK3S_64 && CPU_LITTLE_ENDIAN > > select HAVE_SYSCALL_TRACEPOINTS > > select HAVE_VIRT_CPU_ACCOUNTING > > select HAVE_IRQ_TIME_ACCOUNTING > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > > index 435927f549c4..a2c168b395d2 100644 > > --- a/arch/powerpc/kernel/entry_64.S > > +++ b/arch/powerpc/kernel/entry_64.S > > @@ -1002,6 +1002,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > > ld r2,_NIP(r1) > > mtspr SPRN_SRR0,r2 > > > > + /* > > + * Leaving a stale exception_marker on the stack can confuse > > + * the reliable stack unwinder later on. Clear it. > > + */ > > + li r2,0 > > + std r2,STACK_FRAME_OVERHEAD-16(r1) > > + > > Could you please double check, r4 is already 0 at this point > IIUC. So the change might be a simple > > std r4,STACK_FRAME_OVERHEAD-16(r1) > r4 is not 0, sorry for the noise Balbir