[AArch64][Spec2017]Question about mlow-precision-div optimization.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello world,

 
I found that the mlow-precision-div option have a fix number of newton iterations, which is 2 for float type and 3 for double type.
 
I noticed that if I alter the numbers of newton iterations as following, it could leads to faster performance in SPEC2017 fpspeed test  on AArch64, with less but acceptable precision.
 
Before change: 
 
frecpe  s2, s8
 
frecps  s4, s2, s8
 
fmul    s2, s2, s4
 
frecps  s4, s2, s8
 
fmul    s2, s2, s4
 
fmul   s10, s2
 
 
 
after change:
 
frecpe  s2, s8
 
frecps  s4, s2, s8
 
fmul    s2, s2, s4
 
fmul   s10, s2
 
 
 
The detail of the improvement is shown as following: (change the number of newton iterations for float to 1 and double to 2)
     
Test case
   
Improvement
  
   
603.bwaves_s
   
7.92%
  
   
607.cactuBSSN_s
   
Output miscompare
  
   
619.lbm_s
   
32.34%
  
   
621.wrf_s
   
Output miscompare
  
   
627.cam4_s
   
Output miscompare
  
   
628.pop2_s
   
Output miscompare 
  
   
638.imagick_s
   
-0.97%
  
   
644.nab_s
   
9.09%
  
   
649.fotonik3d_s
   
Output miscompare
  
   
654.roms_s
   
-3.45%
  
  
This may benefit the performance of some test cases which do not have a high demand on precision. 
 
Considering the precision of div is already lower than the IEEE standard when this option is on. Why the precision is fixed by the magic number 2 and 3? 
 
Should we provide a parameter so that users can alter this value according to their needs?




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux