On Mon, 21 Mar 2022 14:04:05 +0100 Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > Ahh, something tracing. I'll go do some patches on top of it. > > Also, folks, I'm thinking we should start to move to __fexit__, if CET > SHSTK ever wants to come to kernel land return trampolines will > insta-stop working. > > Hjl, do you think we could get -mfexit to go along with -mfentry ? If we do every add a -mfexit, we will need to add a __ftail__ call. Because, the current function exit tracing works for functions, even with tail calls. int funcA () { [..] return funcB(); } Can turn into: [..] pop all stack from funcA load reg params to funcB jmp funcB Then when funcB does does it's [..] ret It will pop the call site of funcA (not the call site of funcB) and return to wherever called funcA with the proper return values. This currently works with function graph and kretprobe tracing because of the shadow stack. Let's say we traced both funcA and funcB funcA: call __fentry__ Replace caller address with graph_trampoline and store the return caller into the shadow stack. [..] jmp funcB funcB: call __fentry__ Replace caller address with graph_trampoline and store the return caller (which is the graph_trampoline that was switched earlier) in the shadow stack. [..] ret Returns to the graph_trampoline and we trace the return of funcB. Then we pop off the shadow stack and jump to that. But the shadow stack had a call to the graph_trampoline, which gets called again. Returns to the graph_trampoline and we trace the return of funcA. Then we pop off the shadow stack and jump to that, which is the original caller to funcA. That is, the current algorithm traces the end of both funcA and funcB without issue, because of how the shadow stack works. Now if we add a __fexit__, we will need a way to tell the tracers how to record this scenario. That is why I'm thinking of a jmp to __ftail__. Perhaps something like: funcA: call __fentry__ [..] push address of funcB jmp __ftail__ jmp funcB Where, __ftail__ would do at the end: ret To jump to funcB and we skip the jmp to funcB anyway. And to "nop" it out, we would have to convert it to. funcA: call __fentry__ [..] jmp 1 jmp __ftail__ 1: jmp funcB This is one way I can think of if we include a __fexit__. But to maintain backward compatibility to function graph tracing (which is a requirement), we need to be able to handle such cases. Perhaps this is a good topic to bring up at Plumbers? :-) Do I need to submit a tracing MC, or can we have this conversation at a compiler / toolchain MC? -- Steve