Hi, > I found that the mlow-precision-div option have a fix number of newton iterations, > which is 2 for float type and 3 for double type. > > I noticed that if I alter the numbers of newton iterations as following, it could leads > to faster performance in SPEC2017 fpspeed test on AArch64, with less but > acceptable precision. Which CPU did you try this on? Those results look suspicious - lbm hardly does any divisions for example, so either the computation has gone wrong due to the lower accuracy or your CPU has a really slow divide... On modern cores it is faster to do a division than to use the division approximation instructions. Eg. on Neoverse N1 a float division takes at most 10 cycles while the reduced approximation takes 13 cycles (and needs 3 extra instructions which take up decode and issue slots). Cheers, Wilco