Re: Loop unrolling: black magic or stochastic process?

Brian Budge <brian.budge@xxxxxxxxx> · Wed, 8 Jul 2009 16:36:25 -0700

HI Dev -

Make sure that you actually USE the result (rSquared) otherwise a LOT
of optimization can happen to remove some of the (very time critical)
code.  I'm counting that 60 billion double precision divides should be
happening.  That should be roughly a trillion cycles.  Perhaps half of
that if SSE2 is happening.  That will take quite a long time, I don't
care which compiler it is -- hand code the assembly, it won't matter.

That is one of the pitfalls of benchmarking code like this.  You have
to make sure that the compiler isn't TOO clever :-)

  Brian

On Wed, Jul 8, 2009 at 3:32 PM, dgorur<dgorur@xxxxxxxxx> wrote:
>
> In my original code, I have valid data that I read into the x array before
> calling the doLoop(...) function. In an effort to isolate the problem, I
> created this simple test code to help diagnose the problem. While I
> understand your concern, I was not convinced that this was the problem: if
> it were, why should the removal of the print statement consistently reduce
> the time taken? To test this, I added the following code to main, before the
> call to doLoop(...)
>
>        srand(time(NULL));
>        for (i=0;i<DATA_COUNT;i++){
>                for (j=0;j<INPUT_DIM;j++)
>                        x[i][j] = (double) rand()/RAND_MAX;
>        }
>
> Sure enough, the results show that this makes no difference:
>
> [dgorur@mary008-0304-dhcp-217 Desktop]$ ./a.out
> Beginning loop...
> Rows processed = 100000.
> t1 - t0 = 49 seconds.
> Done. Time elapsed = 49 seconds.
>
> To see if there was something else wrong with my coding or with compiler
> flags, or whether it was a compiler issue, I tested the original code on
> another computer, running a different OS (Linux), and using a different
> version of gcc (4.1.2). This machine also happened to have an Intel compiler
> installed, so I thought I'd see what happened using that. The results have
> made my problem irrelevant. Thank you all for the rapid responses. By the
> way, this *is* on a Linux machine, I found the shell script for the funky
> bash prompt on xkcd.
>
> -Dev
>
> Using the Intel compiler (icc)
>
> C:\home\dgorur\sparseGP>icc -O3 -funroll-loops sparseGP.c -o sparsegp_icc
> C:\home\dgorur\sparseGP>./sparsegp_icc
> This is sparseGP.
>
> Reading data from databaseLarge.dat...
> Done. 84000 data points read.
>
> Beginning correlation loop...
> Rows processed = 84000,
> Time elapsed = 0 seconds.
>
> Using gcc
> C:\home\dgorur\sparseGP>gcc -lm -O3 -funroll-loops sparseGP.c -o
> sparsegp_gcc
> C:\home\dgorur\sparseGP>./sparsegp_gcc
> This is sparseGP.
>
> Reading data from databaseLarge.dat...
> Done. 84000 data points read.
>
> Beginning correlation loop...
> Rows processed = 84000,
> Time elapsed = 278 seconds.
>
>
> Tim Prince-3 wrote:
>>
>> dgorur wrote:
>>> Hi,
>>>
>>> I've been getting unpredictable results with gcc -funroll-loops.
>> As the other reply suggested, you can't expect consistent results when
>> working on uninitialized data, and no amount of code tuning will
>> compensate for it.
>> I wonder about your selection of time(); as you're not threading, and
>> appear to be interested in comparing code execution times, why not clock()
>> ?
>> An obvious question, which you could answer better for yourself, as you
>> have chosen such a quirky version of gcc, is whether -funroll-loops is
>> producing the apparently desired result of unrolling the inner loop fully.
>>  If not, why not use
>>  --param max-unroll-times=6   (3,2)
>> to set a more suitable amount of unrolling?
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Loop-unrolling%3A-black-magic-or-stochastic-process--tp24398027p24400579.html
> Sent from the gcc - Help mailing list archive at Nabble.com.
>
>