I'm suspecting that sse is being used for the arrays because they happen to be appropriately sized. If I remember correctly, gcc4 was to introduce some autovectorization. Perhaps that's whats going on. Brian On Fri, 25 Mar 2005 10:39:07 +0000, Asfand Yar Qazi <email@xxxxxxxxxxxxxxxxx> wrote: > Hi, > > Consider the following little bunch of code (perpared specially for > this question:) > > -------------CODE--------------- > #ifdef __cplusplus > extern "C" > #endif > int printf(const char *format, ...); > > /* > > Lesson: store stuff either as variables, or in arrays - not both! > > */ > > float a[4] = {0.123, -0.231, 0.652, 1}; > float b[4] = {-0.523, -9.6421, 0.0123, 1}; > > float a0 = 0.123, a1 = -0.231, a2 = 0.652, a3 = 1; > float b0 = -0.523, b1 = -9.6421, b2 = 0.0123, b3 = 1; > > const int loopcount = 1000000; > > void > thefunc1(void) > { > int i; > for(i = 0; i < loopcount; ++i) { > a[0] = (b[0] * 0.9999f) + (b[0] * 0.00001f); > a[1] = (b[1] * 0.9999f) + (b[1] * 0.00001f); > a[2] = (b[2] * 0.9999f) + (b[2] * 0.00001f); > > b[0] = a[0]; > b[1] = a[1]; > b[2] = a[2]; > } > } > > void > thefunc2(void) > { > int i; > for(i = 0; i < loopcount; ++i) { > a0 = (b0 * 0.9999f) + (b0 * 0.00001f); > a1 = (b1 * 0.9999f) + (b1 * 0.00001f); > a2 = (b2 * 0.9999f) + (b2 * 0.00001f); > > b0 = a0; > b1 = a1; > b2 = a2; > } > } > > int > main() > { > thefunc1(); > thefunc2(); > > printf("t1: [%.24e, %.24e, %.24e, %.24e]\n", a[0], a[1], a[2], a[3]); > printf("t2: [%.24e, %.24e, %.24e, %.24e]\n", a0, a1, a2, a3); > > return 0; > } > -------------CODE--------------- > > Now, consider the output, using gcc 4 cvs and gcc 3.4.3 (compilation > flags: -ffast-math -march=pentium3 -msse -mfpmath=387 -O3 > -fno-unroll-loops ) > > ./gcc_error-3.4.3 > t1: [-4.255309033630528751103380e-40, -7.845134420060880251139727e-39, > 1.000768933567725568230718e-41, 1.000000000000000000000000e+00] > t2: [-4.255309033630528751103380e-40, -7.845134420060880251139727e-39, > 1.000807363220784352053728e-41, 1.000000000000000000000000e+00] > > ./gcc_error-4 > t1: [-4.255302206601370422491529e-40, -7.845133972967615879964511e-39, > 1.000768933567725568230718e-41, 1.000000000000000000000000e+00] > t2: [-4.255309033630528751103380e-40, -7.845134420060880251139727e-39, > 1.000807363220784352053728e-41, 1.000000000000000000000000e+00] > > Note how on gcc 3.4.3, using variables or arrays of floats gives the > same results. However, on gcc 4, it seems this is no longer the case > (using the above flags, anyhow.) > > Its not a bug, I assume, but could someone explain it to me why this > is happening? > > Then, observe the following with sse fpmath (flags: -ffast-math > -march=pentium3 -msse -mfpmath=sse -O3 -fno-unroll-loops) > > ./gcc_error-3.4.3 > t1: [-4.255449163476961232810473e-40, -7.845069960331521309554464e-39, > 9.890364561204558886579683e-42, 1.000000000000000000000000e+00] > t2: [-4.255449163476961232810473e-40, -7.845069960331521309554464e-39, > 9.890364561204558886579683e-42, 1.000000000000000000000000e+00] > > ./gcc_error-4 > t1: [-4.255449163476961232810473e-40, -7.845069960331521309554464e-39, > 9.890364561204558886579683e-42, 1.000000000000000000000000e+00] > t2: [-4.255449163476961232810473e-40, -7.845069960331521309554464e-39, > 9.890364561204558886579683e-42, 1.000000000000000000000000e+00] > > Using SSE seems to give the sames answers using variables or arrays. > Wha?!?!?! Now I'm even more confused. > > Could someone explain the above to me? Why is there a difference > using arrays or variables in 387 maths, and not in SSE maths? > > Also, is it better (i.e. more efficient, faster, etc.) to use > variables or arrays to hold the data in matrix/vector classes in C++? > > Thanks, > Asfand Yar >