Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable

"Madhavan T. Venkataraman" <madvenka@xxxxxxxxxxxxxxxxxxx> · Tue, 23 Mar 2021 08:38:58 -0500

On 3/23/21 8:36 AM, Mark Rutland wrote:
> On Tue, Mar 23, 2021 at 07:56:40AM -0500, Madhavan T. Venkataraman wrote:
>>
>>
>> On 3/23/21 5:51 AM, Mark Rutland wrote:
>>> On Mon, Mar 15, 2021 at 11:57:57AM -0500, madvenka@xxxxxxxxxxxxxxxxxxx wrote:
>>>> From: "Madhavan T. Venkataraman" <madvenka@xxxxxxxxxxxxxxxxxxx>
>>>>
>>>> When CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled and tracing is activated
>>>> for a function, the ftrace infrastructure is called for the function at
>>>> the very beginning. Ftrace creates two frames:
>>>>
>>>> 	- One for the traced function
>>>>
>>>> 	- One for the caller of the traced function
>>>>
>>>> That gives a reliable stack trace while executing in the ftrace
>>>> infrastructure code. When ftrace returns to the traced function, the frames
>>>> are popped and everything is back to normal.
>>>>
>>>> However, in cases like live patch, execution is redirected to a different
>>>> function when ftrace returns. A stack trace taken while still in the ftrace
>>>> infrastructure code will not show the target function. The target function
>>>> is the real function that we want to track.
>>>>
>>>> So, if an FTRACE frame is detected on the stack, just mark the stack trace
>>>> as unreliable.
>>>
>>> To identify this case, please identify the ftrace trampolines instead,
>>> e.g. ftrace_regs_caller, return_to_handler.
>>>
>>
>> Yes. As part of the return address checking, I will check this. IIUC, I think that
>> I need to check for the inner labels that are defined at the point where the
>> instructions are patched for ftrace. E.g., ftrace_call and ftrace_graph_call.
>>
>> SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
>>         bl      ftrace_stub	<====================================
>>
>> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>> SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL) // ftrace_graph_caller();
>>         nop	<=======                // If enabled, this will be replaced
>>                                         // "b ftrace_graph_caller"
>> #endif
>>
>> For instance, the stack trace I got while tracing do_mmap() with the stack trace
>> tracer looks like this:
>>
>> 		 ...
>> [  338.911793]   trace_function+0xc4/0x160
>> [  338.911801]   function_stack_trace_call+0xac/0x130
>> [  338.911807]   ftrace_graph_call+0x0/0x4
>> [  338.911813]   do_mmap+0x8/0x598
>> [  338.911820]   vm_mmap_pgoff+0xf4/0x188
>> [  338.911826]   ksys_mmap_pgoff+0x1d8/0x220
>> [  338.911832]   __arm64_sys_mmap+0x38/0x50
>> [  338.911839]   el0_svc_common.constprop.0+0x70/0x1a8
>> [  338.911846]   do_el0_svc+0x2c/0x98
>> [  338.911851]   el0_svc+0x2c/0x70
>> [  338.911859]   el0_sync_handler+0xb0/0xb8
>> [  338.911864]   el0_sync+0x180/0x1c0
>>
>>> It'd be good to check *exactly* when we need to reject, since IIUC when
>>> we have a graph stack entry the unwind will be correct from livepatch's
>>> PoV.
>>>
>>
>> The current unwinder already handles this like this:
>>
>> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>         if (tsk->ret_stack &&
>>                 (ptrauth_strip_insn_pac(frame->pc) == (unsigned long)return_to_handler)) {
>>                 struct ftrace_ret_stack *ret_stack;
>>                 /*
>>                  * This is a case where function graph tracer has
>>                  * modified a return address (LR) in a stack frame
>>                  * to hook a function return.
>>                  * So replace it to an original value.
>>                  */
>>                 ret_stack = ftrace_graph_get_ret_stack(tsk, frame->graph++);
>>                 if (WARN_ON_ONCE(!ret_stack))
>>                         return -EINVAL;
>>                 frame->pc = ret_stack->ret;
>>         }
>> #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> 
> Beware that this handles the case where a function will return to
> return_to_handler, but doesn't handle unwinding from *within*
> return_to_handler, which we can't do reliably today, so that might need
> special handling.
> 

OK. I will take a look at this.

>> Is there anything else that needs handling here?
> 
> I wrote up a few cases to consider in:
> 
> https://www.kernel.org/doc/html/latest/livepatch/reliable-stacktrace.html
> 
> ... e.g. the "Obscuring of return addresses" case.
> 
> It might be that we're fine so long as we refuse to unwind across
> exception boundaries, but it needs some thought. We probably need to go
> over each of the trampolines instruction-by-instruction to consider
> that.
> 
> As mentioned above, within return_to_handler when we call
> ftrace_return_to_handler, there's a period where the real return address
> has been removed from the ftrace return stack, but hasn't yet been
> placed in x30, and wouldn't show up in a trace (e.g. if we could somehow
> hook the return from ftrace_return_to_handler).
> 
> We might be saved by the fact we'll mark traces across exception
> boundaries as unreliable, but I haven't thought very hard about it. We
> might want to explciitly reject unwinds within return_to_handler in case
> it's possible to interpose ftrace_return_to_handler somehow.
> 

OK. I will study the above.

Thanks.

Madhavan