Hi Vincenzo, On Fri, Mar 13, 2020 at 03:43:40PM +0000, Vincenzo Frascino wrote: > The vDSO library should only include the necessary headers required for > a userspace library (UAPI and a minimal set of kernel headers). To make > this possible it is necessary to isolate from the kernel headers the > common parts that are strictly necessary to build the library. > > Introduce asm/vdso/arch_timer.h to contain all the arm64 specific > code. This allows to replace the second isb() in __arch_get_hw_counter() > with a fake dependent stack read of the counter which improves the vdso > library peformances of ~4.5%. Below the results of vdsotest [1] ran for > 100 iterations. > > Before the patch: > ================= > clock-gettime-monotonic: syscall: 771 nsec/call > clock-gettime-monotonic: libc: 130 nsec/call > clock-gettime-monotonic: vdso: 111 nsec/call > ... > clock-gettime-realtime: syscall: 762 nsec/call > clock-gettime-realtime: libc: 130 nsec/call > clock-gettime-realtime: vdso: 111 nsec/call > > After the patch: > ================ > clock-gettime-monotonic: syscall: 792 nsec/call > clock-gettime-monotonic: libc: 124 nsec/call > clock-gettime-monotonic: vdso: 106 nsec/call > ... > clock-gettime-realtime: syscall: 776 nsec/call > clock-gettime-realtime: libc: 124 nsec/call > clock-gettime-realtime: vdso: 106 nsec/call > > [1] https://github.com/nathanlynch/vdsotest > > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Cc: Will Deacon <will@xxxxxxxxxx> > Cc: Marc Zyngier <maz@xxxxxxxxxx> > Cc: Mark Rutland <Mark.Rutland@xxxxxxx> > Signed-off-by: Vincenzo Frascino <vincenzo.frascino@xxxxxxx> > --- > arch/arm64/include/asm/arch_timer.h | 29 ++++--------------- > arch/arm64/include/asm/vdso/arch_timer.h | 33 ++++++++++++++++++++++ > arch/arm64/include/asm/vdso/gettimeofday.h | 7 +++-- > 3 files changed, 42 insertions(+), 27 deletions(-) > create mode 100644 arch/arm64/include/asm/vdso/arch_timer.h > > diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h > index 7ae54d7d333a..7f22cd00ad45 100644 > --- a/arch/arm64/include/asm/arch_timer.h > +++ b/arch/arm64/include/asm/arch_timer.h > @@ -164,24 +164,7 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl) > isb(); > } > > -/* > - * Ensure that reads of the counter are treated the same as memory reads > - * for the purposes of ordering by subsequent memory barriers. > - * > - * This insanity brought to you by speculative system register reads, > - * out-of-order memory accesses, sequence locks and Thomas Gleixner. > - * > - * http://lists.infradead.org/pipermail/linux-arm-kernel/2019-February/631195.html > - */ > -#define arch_counter_enforce_ordering(val) do { \ > - u64 tmp, _val = (val); \ > - \ > - asm volatile( \ > - " eor %0, %1, %1\n" \ > - " add %0, sp, %0\n" \ > - " ldr xzr, [%0]" \ > - : "=r" (tmp) : "r" (_val)); \ > -} while (0) > +#include <asm/vdso/arch_timer.h> > > static __always_inline u64 __arch_counter_get_cntpct_stable(void) > { > @@ -189,7 +172,7 @@ static __always_inline u64 __arch_counter_get_cntpct_stable(void) > > isb(); > cnt = arch_timer_reg_read_stable(cntpct_el0); > - arch_counter_enforce_ordering(cnt); > + cnt = arch_counter_enforce_ordering(cnt); > return cnt; Why have you changed the structure of arch_counter_enforce_ordering() to return a value? The commit message has no rationale for that. If there is a reason to change that, I'd prefer the driver change as one patch, before moving the definition. [...] > +/* > + * Ensure that reads of the counter are treated the same as memory reads > + * for the purposes of ordering by subsequent memory barriers. > + * > + * This insanity brought to you by speculative system register reads, > + * out-of-order memory accesses, sequence locks and Thomas Gleixner. > + * > + * http://lists.infradead.org/pipermail/linux-arm-kernel/2019-February/631195.html > + * > + */ > +static u64 arch_counter_enforce_ordering(u64 val) > +{ > + u64 tmp, _val = (val); > + > + asm volatile( > + " eor %0, %1, %1\n" > + " add %0, sp, %0\n" > + " ldr xzr, [%0]" > + : "=r" (tmp) : "r" (_val)); > + > + return _val; > +} This change has no functional effect. Since `_val` is only passed in as an input parameter, the compiler can assume the assembly has no effect on it. As above, what is the rationale for changing this? > @@ -82,10 +83,10 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode) > isb(); > asm volatile("mrs %0, cntvct_el0" : "=r" (res) :: "memory"); > /* > - * This isb() is required to prevent that the seq lock is > - * speculated.# > + * arch_counter_enforce_ordering() is required to prevent that > + * the seq lock is speculated. > */ > - isb(); > + res = arch_counter_enforce_ordering(res); Can we delete the comment entirely? We don't bother in <asm/arch_timer.h>. Even better, can we factor out __arch_counter_get_cntvct(), and use that? Thanks, Mark.