I will send out the next version without frame->reliable as implementing a per-frame reliability thing obviously needs other changes and needs Mark Rutland's code reorg. Thanks! Madhavan On 6/25/21 10:39 AM, Madhavan T. Venkataraman wrote: > > > On 6/24/21 9:40 AM, Mark Rutland wrote: >> Hi Madhavan, >> >> On Wed, May 26, 2021 at 04:49:16PM -0500, madvenka@xxxxxxxxxxxxxxxxxxx wrote: >>> From: "Madhavan T. Venkataraman" <madvenka@xxxxxxxxxxxxxxxxxxx> >>> >>> The unwinder should check for the presence of various features and >>> conditions that can render the stack trace unreliable and mark the >>> the stack trace as unreliable for the benefit of the caller. >>> >>> Introduce the first reliability check - If a return PC is not a valid >>> kernel text address, consider the stack trace unreliable. It could be >>> some generated code. >>> >>> Other reliability checks will be added in the future. >>> >>> Signed-off-by: Madhavan T. Venkataraman <madvenka@xxxxxxxxxxxxxxxxxxx> >> >> At a high-level, I'm on-board with keeping track of this per unwind >> step, but if we do that then I want to be abel to use this during >> regular unwinds (e.g. so that we can have a backtrace idicate when a >> step is not reliable, like x86 does with '?'), and to do that we need to >> be a little more accurate. >> > > The only consumer of frame->reliable is livepatch. So, in retrospect, my > original per-frame reliability flag was an overkill. I was just trying to > provide extra per-frame debug information which is not really a requirement > for livepatch. > > So, let us separate the two. I will rename frame->reliable to frame->livepatch_safe. > This will apply to the whole stacktrace and not to every frame. > > Pass a livepatch_safe flag to start_backtrace(). This will be the initial value > of frame->livepatch_safe. So, if the caller knows that the starting frame is > unreliable, he can pass "false" to start_backtrace(). > > Whenever a reliability check fails, frame->livepatch_safe = false. After that > point, it will remain false till the end of the stacktrace. This keeps it simple. > > Also, once livepatch_safe is set to false, further reliability checks will not > be performed (what would be the point?). > > Finally, it might be a good idea to perform reliability checks even in > start_backtrace() so we don't assume that the starting frame is reliable even > if the caller passes livepatch_safe=true. What do you think? > >> I think we first need to do some more preparatory work for that, but >> regardless, I have some comments below. >> > > I agree that some more work is required to provide per-frame debug information > and tracking. That can be done later. It is not a requirement for livepatch. > >>> --- >>> arch/arm64/include/asm/stacktrace.h | 9 +++++++ >>> arch/arm64/kernel/stacktrace.c | 38 +++++++++++++++++++++++++---- >>> 2 files changed, 42 insertions(+), 5 deletions(-) >>> >>> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h >>> index eb29b1fe8255..4c822ef7f588 100644 >>> --- a/arch/arm64/include/asm/stacktrace.h >>> +++ b/arch/arm64/include/asm/stacktrace.h >>> @@ -49,6 +49,13 @@ struct stack_info { >>> * >>> * @graph: When FUNCTION_GRAPH_TRACER is selected, holds the index of a >>> * replacement lr value in the ftrace graph stack. >>> + * >>> + * @reliable: Is this stack frame reliable? There are several checks that >>> + * need to be performed in unwind_frame() before a stack frame >>> + * is truly reliable. Until all the checks are present, this flag >>> + * is just a place holder. Once all the checks are implemented, >>> + * this comment will be updated and the flag can be used by the >>> + * caller of unwind_frame(). >> >> I'd prefer that we state the high-level semantic first, then drill down >> into detail, e.g. >> >> | @reliable: Indicates whether this frame is beleived to be a reliable >> | unwinding from the parent stackframe. This may be set >> | regardless of whether the parent stackframe was reliable. >> | >> | This is set only if all the following are true: >> | >> | * @pc is a valid text address. >> | >> | Note: this is currently incomplete. >> > > I will change the name of the flag. I will change the comment accordingly. > >>> */ >>> struct stackframe { >>> unsigned long fp; >>> @@ -59,6 +66,7 @@ struct stackframe { >>> #ifdef CONFIG_FUNCTION_GRAPH_TRACER >>> int graph; >>> #endif >>> + bool reliable; >>> }; >>> >>> extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame); >>> @@ -169,6 +177,7 @@ static inline void start_backtrace(struct stackframe *frame, >>> bitmap_zero(frame->stacks_done, __NR_STACK_TYPES); >>> frame->prev_fp = 0; >>> frame->prev_type = STACK_TYPE_UNKNOWN; >>> + frame->reliable = true; >>> } >> >> I think we need more data than this to be accurate. >> >> Consider arch_stack_walk() starting from a pt_regs -- the initial state >> (the PC from the regs) is accurate, but the first unwind from that will >> not be, and we don't account for that at all. >> >> I think we need to capture an unwind type in struct stackframe, which we >> can pass into start_backtrace(), e.g. >> > >> | enum unwind_type { >> | /* >> | * The next frame is indicated by the frame pointer. >> | * The next unwind may or may not be reliable. >> | */ >> | UNWIND_TYPE_FP, >> | >> | /* >> | * The next frame is indicated by the LR in pt_regs. >> | * The next unwind is not reliable. >> | */ >> | UNWIND_TYPE_REGS_LR, >> | >> | /* >> | * We do not know how to unwind to the next frame. >> | * The next unwind is not reliable. >> | */ >> | UNWIND_TYPE_UNKNOWN >> | }; >> >> That should be simple enough to set up around start_backtrace(), but >> we'll need further rework to make that simple at exception boundaries. >> With the entry rework I have queued for v5.14, we're *almost* down to a >> single asm<->c transition point for all vectors, and I'm hoping to >> factor the remainder out to C for v5.15, whereupon we can annotate that >> BL with some metadata for unwinding (with something similar to x86's >> UNWIND_HINT, but retained for runtime). >> > > I understood UNWIND_TYPE_FP and UNWIND_TYPE_REGS_LR. When would UNWIND_TYPE_UNKNOWN > be passed to start_backtrace? Could you elaborate? > > Regardless, the above comment applies only to per-frame tracking when it is eventually > implemented. For livepatch, it is not needed. At exception boundaries, if stack metadata > is available, then use that to unwind safely. Else, livepatch_safe = false. The latter > is what is being done in my patch series. So, we can go with that until stack metadata > becomes available. > > For the UNWIND_TYPE_REGS_LR and UNWIND_TYPE_UNKNOWN cases, the caller will > pass livepatch_safe=false to start_backtrace(). For UNWIND_TYPE_FP, the caller will > pass livepatch_safe=true. So, only UNWIND_TYPE_FP matters for livepatch. > >>> >>> #endif /* __ASM_STACKTRACE_H */ >>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c >>> index d55bdfb7789c..9061375c8785 100644 >>> --- a/arch/arm64/kernel/stacktrace.c >>> +++ b/arch/arm64/kernel/stacktrace.c >>> @@ -44,21 +44,29 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) >>> unsigned long fp = frame->fp; >>> struct stack_info info; >>> >>> + frame->reliable = true; >> >> I'd prefer to do this the other way around, e.g. here do: >> >> | /* >> | * Assume that an unwind step is unreliable until it has passed >> | * all relevant checks. >> | */ >> | frame->reliable = false; >> >> ... then only set this to true once we're certain the step is reliable. >> >> That requires fewer changes below, and would also be more robust as if >> we forget to update this we'd accidentally mark an entry as unreliable >> rather than accidentally marking it as reliable. >> > > For livepatch_safe, the initial statement setting it to true at the > beginning of unwind_frame() goes away. But whenever a reliability check fails, > livepatch_safe has to be set to false. > >>> + >>> /* Terminal record; nothing to unwind */ >>> if (!fp) >>> return -ENOENT; >>> >>> - if (fp & 0xf) >>> + if (fp & 0xf) { >>> + frame->reliable = false; >>> return -EINVAL; >>> + } >>> >>> if (!tsk) >>> tsk = current; >>> >>> - if (!on_accessible_stack(tsk, fp, &info)) >>> + if (!on_accessible_stack(tsk, fp, &info)) { >>> + frame->reliable = false; >>> return -EINVAL; >>> + } >>> >>> - if (test_bit(info.type, frame->stacks_done)) >>> + if (test_bit(info.type, frame->stacks_done)) { >>> + frame->reliable = false; >>> return -EINVAL; >>> + } >>> >>> /* >>> * As stacks grow downward, any valid record on the same stack must be >>> @@ -74,8 +82,10 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) >>> * stack. >>> */ >>> if (info.type == frame->prev_type) { >>> - if (fp <= frame->prev_fp) >>> + if (fp <= frame->prev_fp) { >>> + frame->reliable = false; >>> return -EINVAL; >>> + } >>> } else { >>> set_bit(frame->prev_type, frame->stacks_done); >>> } >>> @@ -100,14 +110,32 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) >>> * So replace it to an original value. >>> */ >>> ret_stack = ftrace_graph_get_ret_stack(tsk, frame->graph++); >>> - if (WARN_ON_ONCE(!ret_stack)) >>> + if (WARN_ON_ONCE(!ret_stack)) { >>> + frame->reliable = false; >>> return -EINVAL; >>> + } >>> frame->pc = ret_stack->ret; >>> } >>> #endif /* CONFIG_FUNCTION_GRAPH_TRACER */ >>> >>> frame->pc = ptrauth_strip_insn_pac(frame->pc); >>> >>> + /* >>> + * Check the return PC for conditions that make unwinding unreliable. >>> + * In each case, mark the stack trace as such. >>> + */ >>> + >>> + /* >>> + * Make sure that the return address is a proper kernel text address. >>> + * A NULL or invalid return address could mean: >>> + * >>> + * - generated code such as eBPF and optprobe trampolines >>> + * - Foreign code (e.g. EFI runtime services) >>> + * - Procedure Linkage Table (PLT) entries and veneer functions >>> + */ >>> + if (!__kernel_text_address(frame->pc)) >>> + frame->reliable = false; >> >> I don't think we should mention PLTs here. They appear in regular kernel >> text, and on arm64 they are generally not problematic for unwinding. The >> case in which they are problematic are where they interpose an >> trampoline call that isn't following the AAPCS (e.g. ftrace calls from a >> module, or calls to __hwasan_tag_mismatch generally), and we'll have to >> catch those explciitly (or forbid RELIABLE_STACKTRACE with HWASAN). >> > > I will remove the mention of PLTs. > >> >From a backtrace perspective, the PC itself *is* reliable, but the next >> unwind from this frame will not be, so I'd like to mark this as >> reliable and the next unwind as unreliable. We can do that with the >> UNWIND_TYPE_UNKNOWN suggestion above. >> > > In the livepatch_safe approach, it can be set to false as soon as the unwinder > realizes that there is unreliability, even if the unreliability is in the next > frame. Actually, this would avoid one extra unwind step for livepatch. > >> For the comment here, how about: >> >> | /* >> | * If the PC is not a known kernel text address, then we cannot >> | * be sure that a subsequent unwind will be reliable, as we >> | * don't know that the code follows our unwind requirements. >> | */ >> | if (!__kernel_text_address(frame-pc)) >> | frame->unwind = UNWIND_TYPE_UNKNOWN; >> > > OK. I can change the comment. > > Thanks! > > Madhavan >