Re: [AArch64][Spec2017]Question about mlow-precision-div optimization.

Wilco Dijkstra <Wilco.Dijkstra@xxxxxxx> · Thu, 27 Feb 2020 17:01:52 +0000

Hi,

> These data I presented is acquired from a cortex-a57 CPU.

>The point that you mentioned in some modern CPU, fdiv is faster than the reciprocal 
> approximation is a new aspect I haven’t come cross.

Well on Cortex-A57 division is also faster, eg. lbm_r is ~3% slower using reciprocal divide.

> And do you think it worth us providing a parameter to alter the iteration so that the
> accuracy can be a trade-off of speed.

What do you mean? We already have -mlow-precision-div (and -sqrt/-recip-sqrt).

> Since spec2017 does result check and will give a test report which indicates miscomputed cases, 
> I suppose the performance improvement is valid.

Try perf stat to show instruction counts, and if they are not increasing due to the extra reciprocal
operations, the benchmark is running incorrectly even if it passes basic checks.

Cheers,
Wilco