* David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote: > On Tue, 2018-01-23 at 08:53 +0100, Ingo Molnar wrote: > > > > The patch below demonstrates the principle, it forcibly enables dynamic ftrace > > patching (CONFIG_DYNAMIC_FTRACE=y et al) and turns mcount/__fentry__ into a RET: > > > > ffffffff81a01a40 <__fentry__>: > > ffffffff81a01a40: c3 retq > > > > This would have to be extended with (very simple) call stack depth tracking (just > > 3 more instructions would do in the fast path I believe) and a suitable SkyLake > > workaround (and also has to play nice with the ftrace callbacks). > > > > On non-SkyLake the overhead would be 0 cycles. > > The overhead of forcing CONFIG_DYNAMIC_FTRACE=y is precisely zero > cycles? That seems a little optimistic. ;) The overhead of the quick hack patch I sent to show what exact code I mean is obviously not zero. The overhead of using my proposed solution, to utilize the function call callback that CONFIG_DYNAMIC_FTRACE=y provides, is exactly zero on non-SkyLake systems where the callback is patched out, on typical Linux distros. The callback is widely enabled on distro kernels: Fedora: CONFIG_DYNAMIC_FTRACE=y Ubuntu: CONFIG_DYNAMIC_FTRACE=y OpenSuse (default flavor): CONFIG_DYNAMIC_FTRACE=y BTW., the reason this is enabled on all distro kernels is because the overhead is a single patched-in NOP instruction in the function epilogue, when tracing is disabled. So it's not even a CALL+RET - it's a patched in NOP. Thanks, Ingo