Re: [RFC PATCH v3 1/4] arm64: Introduce stack trace reliability checks in the unwinder

Josh Poimboeuf <jpoimboe@xxxxxxxxxx> · Tue, 4 May 2021 19:07:28 -0500

On Tue, May 04, 2021 at 06:13:39PM -0500, Madhavan T. Venkataraman wrote:
> 
> 
> On 5/4/21 4:52 PM, Josh Poimboeuf wrote:
> > On Mon, May 03, 2021 at 12:36:12PM -0500, madvenka@xxxxxxxxxxxxxxxxxxx wrote:
> >> @@ -44,6 +44,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> >>  	unsigned long fp = frame->fp;
> >>  	struct stack_info info;
> >>  
> >> +	frame->reliable = true;
> >> +
> > 
> > Why set 'reliable' to true on every invocation of unwind_frame()?
> > Shouldn't it be remembered across frames?
> > 
> 
> This is mainly for debug purposes in case a caller wants to print the whole stack and also
> print which functions are unreliable. For livepatch, it does not make any difference. It will
> quit as soon as it encounters an unreliable frame.

Hm, ok.  So 'frame->reliable' refers to the current frame, not the
entire stack.

> > Also, it looks like there are several error scenarios where it returns
> > -EINVAL but doesn't set 'reliable' to false.
> > 
> 
> I wanted to make a distinction between an error situation (like stack corruption where unwinding
> has to stop) and an unreliable situation (where unwinding can still proceed). E.g., when a
> stack trace is taken for informational purposes or debug purposes, the unwinding will try to
> proceed until either the stack trace ends or an error happens.

Ok, but I don't understand how that relates to my comment.

Why wouldn't a stack corruption like !on_accessible_stack() set
'frame->reliable' to false?

In other words: for livepatch purposes, how does the caller tell the
difference between hitting the final stack record -- which returns an
error with reliable 'true' -- and a stack corruption like
!on_accessible_stack(), which also returns an error with reliable
'true'?  Surely the latter should be considered unreliable?

-- 
Josh