Hi Mark,
Mark Wielaard wrote:
Hi Ian,
On Sat, 2007-06-30 at 15:25 -0400, Ian Rogers wrote:
public static int compare(float x, float y)
{
int ix = floatAsIntBits(x);
int iy = floatAsIntBits(y);
if (ix == iy) return 0;
if (isNaN(x)) return 1;
if (isNaN(y)) return -1;
int ix_sign = ix>>31;
int iy_sign = iy>>31;
if (ix_sign == iy_sign) {
return x > y ? 1 : -1;
} else {
return ix_sign - iy_sign;
}
}
I am not really comfortable with writing this as is since unless the
compiler is smart you will do a multiple isNaN() and floatToIntBits()
calls even if not necessary. Could we try to do something smarter and
factor some of this out like:
// First check for NaNs
if (x != x || y != y)
{
int ix = floatToIntBits(x);
int iy = floatToIntBits(y);
if (ix == iy)
return 0;
if (x != x)
return 1;
if (y != y)
return -1;
}
else
{
// Normal cases, or positive/negative zeros
if (x > y)
return 1;
if (y > x)
return -1;
// Either equal or positive/negative zeros
if (x == 0 && y == 0)
{
int ix = floatToIntBits(x);
int iy = floatToIntBits(y);
int ix_sign = ix >> 31;
int iy_sign = iy >> 31;
return ix_sign - iy_sign;
}
return 0;
}
But maybe I am not giving the compiler/jit enough credit for actually
producing such an optimized version in the first place. And this isn't
actually the same as your code since I moved the NaN checks first again.
The above assumes the NaN test (x != x) is relatively cheap since NaN is
a canonical value is java. That might be a wrong assumption. You might
be able to do something more clever with the actual ordering of floats
according to > (since if any argument is NaN or both are zero it
evaluates to false) or with the ordering of floatToIntBits, but my head
was spinning trying to keep all the corner cases right (this calls for
expanding the Mauve Float.compareTo() test).
Thanks for the feedback. I think your code may well be better. I think
its good to have identified that there is at least an ~8% potential
speed up on IA32 with my code, and potentially better with yours. There
is of course a big dependence on what values are being compared and on
the architecture. For the arrays of floats I'm seeing in the Jikes RVM a
lot of the values are identical (either 0.0 or 1.0). With your code that
has a path of 6 compares to get through. With mine its just one compare,
but because a floating point compare of 0.0 and -0.0 will yield that
they are identical I'm forced into doing it on their int bit
representation to keep the 0.0 case sane.
Here's my hand wavy costing of the now 3 ways we have a doing things:
Case: x == y neither are NaN or 0.0
Newer Cost: 5 float compares
New Cost: 2 float as ints, 1 int compare
Old Cost: 5 float compares
Case: x < y and different sign
Newer Cost: 4 float compares
New Cost: 2 float as ints, 2 int compares, 2 float compares, 2 shifts,
one subtract
Old Cost: 6 float compares
Case: x < y and same sign
Newer Cost: 4 float compares
New Cost: 2 float as ints, 2 int compares, 3 float compares, 2 shifts
Old Cost: 6 float compares
Case: x is NaN
Newer Cost: 2 float compares (I think you could reuse the original code
which is slightly cheaper than yours)
New Cost: 2 float as ints, 1 int compare, 1 float compare
Old Cost: 2 float compares
Case: y is NaN
Newer Cost: 2 float compares (again assuming reuse of the original code)
New Cost: 2 float as ints, 1 int compare, 2 float compares
Old Cost: 2 float compares
Case: x == 0.0 and y == 0.0
Newer Cost: 2 float as ints, 6 float compares, 2 shifts, one subtract
New Cost: either 2 float as ints, 1 int compare or 2 float as ints, 2
int compares, 2 float compares, 2 shifts, one subtract
Old Cost: 4 float compares, 2 float divides, 1 float subtract, 1 float
to int
So I think the cases where the different ways win out are: most values
are </> then the newer way has the shortest path, most values are ==
then the new way has the shortest path, for two zeros (common in Jikes
RVM :-( ) the new way wins out, and no one really cares about NaNs :-)
I agree about the desirable properties of your code, and I think they
also address Pinski's concern, that floatToIntBits on the common path
seems bad. I also agree that its good to have a short < and > path.
Possibly the only redeeming feature of the new code was that it made ==
cheap (depending on the floatToIntBits cost). So my variant of your code
would be:
// First check for NaNs
if (x != x) {
if (y != y) return 0;
else return 1;
}
else if (y != y) {
return -1;
}
else
{
// Normal cases, or positive/negative zeros
if (x > y)
return 1;
if (y > x)
return -1;
// Either equal or positive/negative zeros
if (x == 0 && y == 0)
{
int ix = floatToIntBits(x);
int iy = floatToIntBits(y);
int ix_sign = ix >> 31;
int iy_sign = iy >> 31;
return ix_sign - iy_sign;
}
return 0;
}
I'll wait for some more feedback on the list before sending this in as a
patch. I guess in any case we should separate this out in
VMFloat/VMDouble as I can imagine for a CPU lacking an FPU you'd want
another choice.
Regards,
Ian