Re: linux-next: build warnings after merge of the tip tree

Steven Rostedt <rostedt@xxxxxxxxxxx> · Mon, 21 Mar 2022 11:28:05 -0400

On Mon, 21 Mar 2022 14:04:05 +0100
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> Ahh, something tracing. I'll go do some patches on top of it.
> 
> Also, folks, I'm thinking we should start to move to __fexit__, if CET
> SHSTK ever wants to come to kernel land return trampolines will
> insta-stop working.
> 
> Hjl, do you think we could get -mfexit to go along with -mfentry ?

If we do every add a -mfexit, we will need to add a __ftail__ call.
Because, the current function exit tracing works for functions, even with
tail calls.

int funcA () {
	[..]
	return funcB();
}

Can turn into:

	[..]
	pop all stack from funcA
	load reg params to funcB
	jmp funcB

Then when funcB does does it's

	[..]
	ret

It will pop the call site of funcA (not the call site of funcB) and return
to wherever called funcA with the proper return values.

This currently works with function graph and kretprobe tracing because of
the shadow stack. Let's say we traced both funcA and funcB

funcA:
	call __fentry__

			Replace caller address with graph_trampoline and
			store the return caller into the shadow stack.

	[..]
	jmp funcB

funcB:
	call __fentry__

			Replace caller address with graph_trampoline and
			store the return caller (which is the
			graph_trampoline that was switched earlier) in the
			shadow stack.

	[..]
	ret

			Returns to the graph_trampoline and we trace the
			return of funcB. Then we pop off the shadow stack
			and jump to that. But the shadow stack had a call
			to the graph_trampoline, which gets called again.

			Returns to the graph_trampoline and we trace the
			return of funcA. Then we pop off the shadow stack
			and jump to that, which is the original caller to
			funcA.

That is, the current algorithm traces the end of both funcA and funcB
without issue, because of how the shadow stack works.

Now if we add a __fexit__, we will need a way to tell the tracers how to
record this scenario. That is why I'm thinking of a jmp to __ftail__.

Perhaps something like:

funcA:
	call __fentry__
	[..]
	push address of funcB
	jmp __ftail__
	jmp funcB

Where, __ftail__ would do at the end:

	ret

To jump to funcB and we skip the jmp to funcB anyway.

And to "nop" it out, we would have to convert it to.

funcA:
	call __fentry__
	[..]
	jmp 1
	jmp __ftail__
1:	jmp funcB

This is one way I can think of if we include a __fexit__. But to maintain
backward compatibility to function graph tracing (which is a requirement),
we need to be able to handle such cases.

Perhaps this is a good topic to bring up at Plumbers? :-)

Do I need to submit a tracing MC, or can we have this conversation at a
compiler / toolchain MC?

-- Steve