Michael Ellerman <mpe@xxxxxxxxxxxxxx> writes: > Joe Lawrence <joe.lawrence@xxxxxxxxxx> writes: >> From: Nicolai Stange <nstange@xxxxxxx> >> >> The ppc64 specific implementation of the reliable stacktracer, >> save_stack_trace_tsk_reliable(), bails out and reports an "unreliable >> trace" whenever it finds an exception frame on the stack. Stack frames >> are classified as exception frames if the STACK_FRAME_REGS_MARKER magic, >> as written by exception prologues, is found at a particular location. >> >> However, as observed by Joe Lawrence, it is possible in practice that >> non-exception stack frames can alias with prior exception frames and thus, >> that the reliable stacktracer can find a stale STACK_FRAME_REGS_MARKER on >> the stack. It in turn falsely reports an unreliable stacktrace and blocks >> any live patching transition to finish. Said condition lasts until the >> stack frame is overwritten/initialized by function call or other means. >> >> In principle, we could mitigate this by making the exception frame >> classification condition in save_stack_trace_tsk_reliable() stronger: >> in addition to testing for STACK_FRAME_REGS_MARKER, we could also take into >> account that for all exceptions executing on the kernel stack >> - their stack frames's backlink pointers always match what is saved >> in their pt_regs instance's ->gpr[1] slot and that >> - their exception frame size equals STACK_INT_FRAME_SIZE, a value >> uncommonly large for non-exception frames. >> >> However, while these are currently true, relying on them would make the >> reliable stacktrace implementation more sensitive towards future changes in >> the exception entry code. Note that false negatives, i.e. not detecting >> exception frames, would silently break the live patching consistency model. >> >> Furthermore, certain other places (diagnostic stacktraces, perf, xmon) >> rely on STACK_FRAME_REGS_MARKER as well. >> >> Make the exception exit code clear the on-stack STACK_FRAME_REGS_MARKER >> for those exceptions running on the "normal" kernel stack and returning >> to kernelspace: because the topmost frame is ignored by the reliable stack >> tracer anyway, returns to userspace don't need to take care of clearing >> the marker. >> >> Furthermore, as I don't have the ability to test this on Book 3E or >> 32 bits, limit the change to Book 3S and 64 bits. >> >> Finally, make the HAVE_RELIABLE_STACKTRACE Kconfig option depend on >> PPC_BOOK3S_64 for documentation purposes. Before this patch, it depended >> on PPC64 && CPU_LITTLE_ENDIAN and because CPU_LITTLE_ENDIAN implies >> PPC_BOOK3S_64, there's no functional change here. > > That has nothing to do with the fix and should really be in a separate > patch. > > I can split it when applying. If you don't mind, that would be nice! Or simply drop that chunk... Otherwise, let me know if I shall send a split v2 for this patch [1/4] only. Thanks, Nicolai -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)