Re: Options for using hardware implementation of remainder and square root on i32

Tim Prince <TimothyPrince@xxxxxxxxxxxxx> · Sun, 04 Jan 2009 16:20:05 -0800

Kwok, Yipkei wrote:
Although the Intel x86 processors are capable of computing remainder 
and square root operations in hardware, both g++ and icpc (Intel 
compiler) realize these operations in software [1].

Question 1: Is there any specific result why they do it this way 
(using software implementation instead of hardware)?
The usual way to implement sqrt, when optimizing, would be to execute 
the sqrt instruction, check the processor flags, and retry with the 
library function if any flag is set which would indicate a requirement 
for errno processing or exception handling.

I don't know the answer as to whether C99 remainder functions are 
commonly available in compilers when -std=c99 is set, and what g++ and 
icpc may do to permit their use as an extension beyond C++ standard.  
Assuming that they have to be implemented with x87 code, while one would 
normally be using SSE code generation options, the performance 
implications become such that the code size vs performance tradeoff 
dictates a library function call.
When you have performance or accuracy critical code requiring 
remaindering, you probably have to study the facilities of the target 
platform.

Question 2: Is there any compiler option that force these operations 
to be done in hardware?

In order to support Fortran, gcc has a specific flag to turn off the 
sqrt retry, thus breaking errno and exception handling.  Maybe g++ 
-ffast-math or icpc -fp-model fast=2 might include such an option.  
Those are big hammers which evidently aren't suitable for general use.
I need these answers in order compare differences, if there is any, in 
terms of performance and results.
If you want strictly correct results, according to IEEE754, you must 
consider whether you want your operations to be widened, e.g. according 
to x87 precision setting, and, with icpc, you must set -prec-sqrt.  This 
is not a question of whether you have an in-line sqrt instruction, but 
will destroy your conclusions if you ignore the issues. There is a clear 
performance hit in supporting errno and exception handling, which you 
normally expect to incur in C++.
You have the option of writing in-line asm code, if that is what you 
want to evaluate.

I imagine it's difficult to get people excited nowadays about something 
specific to 32-bit compilers for a specific CPU architecture, when there 
are no longer any 32-bit CPUs in production.