Re: [PATCH v5 06/34] function_graph: Allow multiple users to attach to function graph

Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> · Wed, 20 Dec 2023 00:45:40 +0900

On Tue, 19 Dec 2023 14:23:37 +0100
Jiri Olsa <olsajiri@xxxxxxxxx> wrote:

> On Mon, Dec 18, 2023 at 10:12:45PM +0900, Masami Hiramatsu (Google) wrote:
> 
> SNIP
> 
> >  /* Both enabled by default (can be cleared by function_graph tracer flags */
> >  static bool fgraph_sleep_time = true;
> >  
> > @@ -126,9 +247,34 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
> >  	calltime = trace_clock_local();
> >  
> >  	index = current->curr_ret_stack;
> > -	RET_STACK_INC(current->curr_ret_stack);
> > +	/* ret offset = 1 ; type = reserved */
> > +	current->ret_stack[index + FGRAPH_RET_INDEX] = 1;
> >  	ret_stack = RET_STACK(current, index);
> > +	ret_stack->ret = ret;
> > +	/*
> > +	 * The unwinders expect curr_ret_stack to point to either zero
> > +	 * or an index where to find the next ret_stack. Even though the
> > +	 * ret stack might be bogus, we want to write the ret and the
> > +	 * index to find the ret_stack before we increment the stack point.
> > +	 * If an interrupt comes in now before we increment the curr_ret_stack
> > +	 * it may blow away what we wrote. But that's fine, because the
> > +	 * index will still be correct (even though the 'ret' won't be).
> > +	 * What we worry about is the index being correct after we increment
> > +	 * the curr_ret_stack and before we update that index, as if an
> > +	 * interrupt comes in and does an unwind stack dump, it will need
> > +	 * at least a correct index!
> > +	 */
> >  	barrier();
> > +	current->curr_ret_stack += FGRAPH_RET_INDEX + 1;
> > +	/*
> > +	 * This next barrier is to ensure that an interrupt coming in
> > +	 * will not corrupt what we are about to write.
> > +	 */
> > +	barrier();
> > +
> > +	/* Still keep it reserved even if an interrupt came in */
> > +	current->ret_stack[index + FGRAPH_RET_INDEX] = 1;
> 
> seems like this was set already few lines above?

Yeah, there is a trick (trap) for interrupts between writes. We can not do
atomically write the last stack entry and increment stack index. But it must
be done for shadow unwinding insinde interrupts. Thus,

(1) write a reserve type entry on the new stack entry
(2) increment curr_ret_stack to the new stack entry
(3) rewrite the new stack entry again

If an interrupt happens between (1) and (2), stack unwinder can find the
correct latest shadow stack frame from the curr_ret_stack. This interrupts
can store their shadow stack so... wait something went wrong.

If the interrupt *overwrites* the shadow stack and (3) recovers it,
if another interrupt before (3), the shadow stack will be corrupted...

OK, I think we need a "rsrv_ret_stack" index. Basically new one will do;

(1) increment rsrv_ret_stack
(2) write a reserve type entry
(3) set curr_ret_stack = rsrv_ret_stack

And before those,

(0) if rsrv_ret_stack != curr_ret_stack, write a reserve type entry at
    rsrv_ret_stack for the previous frame (which offset can be read
    from curr_ret_stack)

Than it will never be broken.
(of course when decrement curr_ret_stack, rsrv_ret_stack is also decremented)

Thank you,

> 
> jirka
> 
> > +
> >  	ret_stack->ret = ret;
> >  	ret_stack->func = func;
> >  	ret_stack->calltime = calltime;
> > @@ -159,6 +305,12 @@ int function_graph_enter(unsigned long ret, unsigned long func,
> >  			 unsigned long frame_pointer, unsigned long *retp)
> >  {
> >  	struct ftrace_graph_ent trace;
> > +	int offset;
> > +	int start;
> > +	int type;
> > +	int val;
> > +	int cnt = 0;
> > +	int i;
> >  
> >  #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
> >  	/*
> 
> SNIP
> 

-- 
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>