From: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> Date: Mon, 12 Jun 2017 12:48:27 -0400 > @@ -853,13 +851,19 @@ unsigned long long sched_clock(void) > { > unsigned long quotient = tick_operations.ticks_per_nsec_quotient; > unsigned long offset = tick_operations.offset; > - unsigned long ticks = tick_operations.get_tick(); > > - return ((ticks * quotient) >> SPARC64_NSEC_PER_CYC_SHIFT) - offset; > + /* Use wmb so the compiler emits the loads first and overlaps load > + * latency with reading tick, because reading %tick/%stick is a > + * post-sync instruction that will flush and restart subsequent > + * instructions after it commits. > + */ > + wmb(); > + > + return ((get_tick() * quotient) >> SPARC64_NSEC_PER_CYC_SHIFT) - offset; > } I think you need to use barrier() here not wmb(). wmb() orders memory operations wrt. other memory operations. get_tick() doesn't modify memory nor access memory, so as far as the compiler is concerned it can still legal order the loads after get_tick() if it really wanted to. barrier() emits a volatile empty asm, which strictly orders all operations before and after the barrier(). -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html