HI Dev - Make sure that you actually USE the result (rSquared) otherwise a LOT of optimization can happen to remove some of the (very time critical) code. I'm counting that 60 billion double precision divides should be happening. That should be roughly a trillion cycles. Perhaps half of that if SSE2 is happening. That will take quite a long time, I don't care which compiler it is -- hand code the assembly, it won't matter. That is one of the pitfalls of benchmarking code like this. You have to make sure that the compiler isn't TOO clever :-) Brian On Wed, Jul 8, 2009 at 3:32 PM, dgorur<dgorur@xxxxxxxxx> wrote: > > In my original code, I have valid data that I read into the x array before > calling the doLoop(...) function. In an effort to isolate the problem, I > created this simple test code to help diagnose the problem. While I > understand your concern, I was not convinced that this was the problem: if > it were, why should the removal of the print statement consistently reduce > the time taken? To test this, I added the following code to main, before the > call to doLoop(...) > > srand(time(NULL)); > for (i=0;i<DATA_COUNT;i++){ > for (j=0;j<INPUT_DIM;j++) > x[i][j] = (double) rand()/RAND_MAX; > } > > Sure enough, the results show that this makes no difference: > > [dgorur@mary008-0304-dhcp-217 Desktop]$ ./a.out > Beginning loop... > Rows processed = 100000. > t1 - t0 = 49 seconds. > Done. Time elapsed = 49 seconds. > > To see if there was something else wrong with my coding or with compiler > flags, or whether it was a compiler issue, I tested the original code on > another computer, running a different OS (Linux), and using a different > version of gcc (4.1.2). This machine also happened to have an Intel compiler > installed, so I thought I'd see what happened using that. The results have > made my problem irrelevant. Thank you all for the rapid responses. By the > way, this *is* on a Linux machine, I found the shell script for the funky > bash prompt on xkcd. > > -Dev > > Using the Intel compiler (icc) > > C:\home\dgorur\sparseGP>icc -O3 -funroll-loops sparseGP.c -o sparsegp_icc > C:\home\dgorur\sparseGP>./sparsegp_icc > This is sparseGP. > > Reading data from databaseLarge.dat... > Done. 84000 data points read. > > Beginning correlation loop... > Rows processed = 84000, > Time elapsed = 0 seconds. > > Using gcc > C:\home\dgorur\sparseGP>gcc -lm -O3 -funroll-loops sparseGP.c -o > sparsegp_gcc > C:\home\dgorur\sparseGP>./sparsegp_gcc > This is sparseGP. > > Reading data from databaseLarge.dat... > Done. 84000 data points read. > > Beginning correlation loop... > Rows processed = 84000, > Time elapsed = 278 seconds. > > > Tim Prince-3 wrote: >> >> dgorur wrote: >>> Hi, >>> >>> I've been getting unpredictable results with gcc -funroll-loops. >> As the other reply suggested, you can't expect consistent results when >> working on uninitialized data, and no amount of code tuning will >> compensate for it. >> I wonder about your selection of time(); as you're not threading, and >> appear to be interested in comparing code execution times, why not clock() >> ? >> An obvious question, which you could answer better for yourself, as you >> have chosen such a quirky version of gcc, is whether -funroll-loops is >> producing the apparently desired result of unrolling the inner loop fully. >> If not, why not use >> --param max-unroll-times=6 (3,2) >> to set a more suitable amount of unrolling? >> >> >> > > -- > View this message in context: http://www.nabble.com/Loop-unrolling%3A-black-magic-or-stochastic-process--tp24398027p24400579.html > Sent from the gcc - Help mailing list archive at Nabble.com. > >