* David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote: > On Tue, 2018-01-23 at 11:15 +0100, Ingo Molnar wrote: > > > > BTW., the reason this is enabled on all distro kernels is because the overhead > > is a single patched-in NOP instruction in the function epilogue, when tracing > > is disabled. So it's not even a CALL+RET - it's a patched in NOP. > > Hm? We still have GCC emitting 'call __fentry__' don't we? Would be nice to get > to the point where we can patch *that* out into a NOP... or are you saying we > already can? Yes, we already can and do patch the 'call __fentry__/ mcount' call site into a NOP today - all 50,000+ call sites on a typical distro kernel. We did so for a long time - this is all a well established, working mechanism. > But this is a digression. I was being pedantic about the "0 cycles" but sure, > this would be perfectly tolerable. It's not a digression in two ways: - I wanted to make it clear that for distro kernels it _is_ a zero cycles overhead mechanism for non-SkyLake CPUs, literally. - I noticed that Meltdown and the CR3 writes for PTI appears to have established a kind of ... insensitivity and numbness to kernel micro-costs, which peaked with the per-syscall MSR write nonsense patch of the SkyLake workaround. That attitude is totally unacceptable to me as x86 maintainer and yes, still every cycle counts. Thanks, Ingo