Re: Utilizing GCC Prefetch Analysis -- Instructions not being generated

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



To clarify the second part,

the objdump I am showing is the expansion of the built_in prefetch
intrinsics. I used objdump --source -d ./my_program. Hence, I was
expecting built_in prefetch to use ldl, lds,or ldq rather than lda.

Thanks,

Malek

On Thu, Aug 21, 2014 at 12:33 AM, Malek Musleh <malek.musleh@xxxxxxxxx> wrote:
> Hi,
>
> I am trying to determine the performance impact of gcc's internal
> software prefetching analysis. I have compiled my benchmarks with the
> following flags:
>
> CFLAGS=-O3 -ffast-math -funroll-loops -fprefetch-loop-arrays
>
> However, after compiling, and examining the objdump of the binary, I
> do not see any inserted prefetch instructions. Specifically, I am
> using an ALPHA cross compiler (gcc version 4.2, so I know it has
> prefetching support), and the prefetch instructions that should be
> generated are: lds, ldl, or ldq
>
> http://www.eecg.toronto.edu/~moshovos/ACA05/read/Performance%20tips%20for%20Alpha%20Linux%20C%20programmers.htm
>
>
> My example program code snippet is:
>
> int main (int argc, char *argv[])
> {
>
>   for (i = 0; i < 10000; i++){
>     for (j = 0; j < 10000; j++){
>       a[i][j] = b[j][0] + b[j+1][0];
>     }
>   }
> }
>
> The loops are large, and regular enough so the analysis pass should
> determine that prefetching is possible. Would anyone know why the
> instructions are not being generated, or if the objdump is not
> capturing those prefetch instructions?
>
> As a separate note, I did try to use the gcc prefetch intrinsics, and
> examined the objdump:
>
>         __builtin_prefetch (&a[i+j], 1, 1);
>    12000060c:   20 00 4f a0     .long 0xa04f0020
>    120000610:   1c 00 2f a0     .long 0xa02f001c
>    120000614:   01 00 41 40     .long 0x40410001
>    120000618:   01 00 e1 43     .long 0x43e10001
>    12000061c:   42 16 20 40     .long 0x40201642
>    120000620:   30 00 2f 20     lda     t0,48(fp)
>    120000624:   01 04 22 40     .long 0x40220401
>    120000628:   00 00 e1 8b     .long 0x8be10000
>         __builtin_prefetch (&b[i+j], 0, 1);
>    12000062c:   20 00 4f a0     .long 0xa04f0020
>    120000630:   1c 00 2f a0     .long 0xa02f001c
>    120000634:   01 00 41 40     .long 0x40410001
>    120000638:   01 00 e1 43     .long 0x43e10001
>    12000063c:   42 16 20 40     .long 0x40201642
>    120000640:   70 1f 2f 20     lda     t0,8048(fp)
>    120000644:   01 04 22 40     .long 0x40220401
>    120000648:   00 00 e1 a3     .long 0xa3e10000
>
> In this case, it seems that the compiler is generating a different set
> of instructions for the prefetch instrinsic, and not using what the
> alpha manual says.
>
> Thanks,
>
> Malek




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux