In my original code, I have valid data that I read into the x array before calling the doLoop(...) function. In an effort to isolate the problem, I created this simple test code to help diagnose the problem. While I understand your concern, I was not convinced that this was the problem: if it were, why should the removal of the print statement consistently reduce the time taken? To test this, I added the following code to main, before the call to doLoop(...) srand(time(NULL)); for (i=0;i<DATA_COUNT;i++){ for (j=0;j<INPUT_DIM;j++) x[i][j] = (double) rand()/RAND_MAX; } Sure enough, the results show that this makes no difference: [dgorur@mary008-0304-dhcp-217 Desktop]$ ./a.out Beginning loop... Rows processed = 100000. t1 - t0 = 49 seconds. Done. Time elapsed = 49 seconds. To see if there was something else wrong with my coding or with compiler flags, or whether it was a compiler issue, I tested the original code on another computer, running a different OS (Linux), and using a different version of gcc (4.1.2). This machine also happened to have an Intel compiler installed, so I thought I'd see what happened using that. The results have made my problem irrelevant. Thank you all for the rapid responses. By the way, this *is* on a Linux machine, I found the shell script for the funky bash prompt on xkcd. -Dev Using the Intel compiler (icc) C:\home\dgorur\sparseGP>icc -O3 -funroll-loops sparseGP.c -o sparsegp_icc C:\home\dgorur\sparseGP>./sparsegp_icc This is sparseGP. Reading data from databaseLarge.dat... Done. 84000 data points read. Beginning correlation loop... Rows processed = 84000, Time elapsed = 0 seconds. Using gcc C:\home\dgorur\sparseGP>gcc -lm -O3 -funroll-loops sparseGP.c -o sparsegp_gcc C:\home\dgorur\sparseGP>./sparsegp_gcc This is sparseGP. Reading data from databaseLarge.dat... Done. 84000 data points read. Beginning correlation loop... Rows processed = 84000, Time elapsed = 278 seconds. Tim Prince-3 wrote: > > dgorur wrote: >> Hi, >> >> I've been getting unpredictable results with gcc -funroll-loops. > As the other reply suggested, you can't expect consistent results when > working on uninitialized data, and no amount of code tuning will > compensate for it. > I wonder about your selection of time(); as you're not threading, and > appear to be interested in comparing code execution times, why not clock() > ? > An obvious question, which you could answer better for yourself, as you > have chosen such a quirky version of gcc, is whether -funroll-loops is > producing the apparently desired result of unrolling the inner loop fully. > If not, why not use > --param max-unroll-times=6 (3,2) > to set a more suitable amount of unrolling? > > > -- View this message in context: http://www.nabble.com/Loop-unrolling%3A-black-magic-or-stochastic-process--tp24398027p24400579.html Sent from the gcc - Help mailing list archive at Nabble.com.