Re: [v2 8/8] sparc64: optimize functions that access tick

David Miller <davem@xxxxxxxxxxxxx> · Mon, 12 Jun 2017 15:13:15 -0400 (EDT)

From: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx>
Date: Mon, 12 Jun 2017 12:48:27 -0400

> @@ -853,13 +851,19 @@ unsigned long long sched_clock(void)
>  {
>  	unsigned long quotient = tick_operations.ticks_per_nsec_quotient;
>  	unsigned long offset = tick_operations.offset;
> -	unsigned long ticks = tick_operations.get_tick();
>  
> -	return ((ticks * quotient) >> SPARC64_NSEC_PER_CYC_SHIFT) - offset;
> +	/* Use wmb so the compiler emits the loads first and overlaps load
> +	 * latency with reading tick, because reading %tick/%stick is a
> +	 * post-sync instruction that will flush and restart subsequent
> +	 * instructions after it commits.
> +	 */
> +	wmb();
> +
> +	return ((get_tick() * quotient) >> SPARC64_NSEC_PER_CYC_SHIFT) - offset;
>  }

I think you need to use barrier() here not wmb().

wmb() orders memory operations wrt. other memory operations.

get_tick() doesn't modify memory nor access memory, so as far as the
compiler is concerned it can still legal order the loads after
get_tick() if it really wanted to.

barrier() emits a volatile empty asm, which strictly orders all
operations before and after the barrier().
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html