[AArch64][Spec2017]Question about mlow-precision-div optimization.

"Bu Le" <cityubule@xxxxxx> · Sat, 22 Feb 2020 10:30:02 +0800

Hello world,

I found that the mlow-precision-div option have a fix number of newton iterations, which is 2 for float type and 3 for double type.

I noticed that if I alter the numbers of newton iterations as following, it could leads to faster performance in SPEC2017 fpspeed test &nbsp;on AArch64, with less but acceptable precision.

Before change: 

frecpe&nbsp; s2, s8

frecps&nbsp; s4, s2, s8

fmul&nbsp; &nbsp; s2, s2, s4

frecps&nbsp; s4, s2, s8

fmul&nbsp; &nbsp; s2, s2, s4

fmul &nbsp; s10, s2

&nbsp;

after change:

frecpe&nbsp; s2, s8

frecps&nbsp; s4, s2, s8

fmul&nbsp; &nbsp; s2, s2, s4

fmul &nbsp; s10, s2

&nbsp;

The detail of the improvement is shown as following: (change the number of newton iterations for float to 1 and double to 2)

Test case

Improvement

603.bwaves_s

7.92%

607.cactuBSSN_s

Output miscompare

619.lbm_s

32.34%

621.wrf_s

Output miscompare

627.cam4_s

Output miscompare

628.pop2_s

Output miscompare 

638.imagick_s

-0.97%

644.nab_s

9.09%

649.fotonik3d_s

Output miscompare

654.roms_s

-3.45%

This may benefit the performance of some test cases which do not have a high demand on precision. 

Considering the precision of div is already lower than the IEEE standard when this option is on. Why the precision is fixed by the magic number 2 and 3? 

Should we provide a parameter so that users can alter this value according to their needs?