Re: [AArch64][Spec2017]Question about mlow-precision-div optimization.

"=?gb18030?b?QnUgTGU=?=" <cityubule@xxxxxx> · Wed, 26 Feb 2020 01:01:42 +0800

Hi,

Thanks for the reply.

These data I presented is acquired from a&nbsp;cortex-a57&nbsp;CPU.&nbsp;&nbsp; &nbsp;

Since spec2017 does result check and will give a test report which indicates miscomputed cases, I suppose the performance improvement is valid.

The point that you mentioned in some modern CPU, fdiv is faster than the reciprocal approximation is a new aspect I haven’t come cross.

Nevertheless, in a CPU that reciprocal approximation make a profit, like my case, may I ask why the number of newton iteration is fixed to 2 and 3?

And do you think it worth us providing a parameter to alter the iteration so that the accuracy can be a trade-off of speed.

By the way, the original data is as following.&nbsp;

Test case&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; &nbsp; &nbsp;Improvement
603.bwaves_s&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;7.92%
607.cactuBSSN_s&nbsp; &nbsp; &nbsp;&nbsp;Output miscompare
619.lbm_s&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;32.34%
621.wrf_s&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Output miscompare
627.cam4_s&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;Output miscompare
628.pop2_s&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Output miscompare
638.imagick_s&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-0.97%
644.nab_s&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;9.09%
649.fotonik3d_s&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Output miscompare
654.roms_s&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-3.45%

------------------&nbsp;Original&nbsp;------------------
From:&nbsp;"Wilco Dijkstra"<Wilco.Dijkstra@xxxxxxx&gt;;
Date:&nbsp;Mon, Feb 24, 2020 08:59 PM
To:&nbsp;"gcc-help@xxxxxxxxxxx"<gcc-help@xxxxxxxxxxx&gt;;"Bu Le"<cityubule@xxxxxx&gt;;

Subject:&nbsp;Re: [AArch64][Spec2017]Question about mlow-precision-div optimization.

Hi,

&gt; I found that the mlow-precision-div option have a fix number of newton iterations, 
&gt; which is 2 for float type and 3 for double type.
&gt;
&gt; I noticed that if I alter the numbers of newton iterations as following, it could leads
&gt; to faster performance in SPEC2017 fpspeed test &amp;nbsp;on AArch64, with less but
&gt; acceptable precision.
&nbsp;
Which CPU did you try this on? Those results look suspicious - lbm hardly does any
divisions for example, so either the computation has gone wrong due to the lower
accuracy or your CPU has a really slow divide...

On modern cores it is faster to do a division than to use the division approximation
instructions. Eg. on Neoverse N1 a float division takes at most 10 cycles while the
reduced approximation takes 13 cycles (and needs 3 extra instructions which take up
decode and issue slots).

Cheers,
Wilco