To clarify the second part, the objdump I am showing is the expansion of the built_in prefetch intrinsics. I used objdump --source -d ./my_program. Hence, I was expecting built_in prefetch to use ldl, lds,or ldq rather than lda. Thanks, Malek On Thu, Aug 21, 2014 at 12:33 AM, Malek Musleh <malek.musleh@xxxxxxxxx> wrote: > Hi, > > I am trying to determine the performance impact of gcc's internal > software prefetching analysis. I have compiled my benchmarks with the > following flags: > > CFLAGS=-O3 -ffast-math -funroll-loops -fprefetch-loop-arrays > > However, after compiling, and examining the objdump of the binary, I > do not see any inserted prefetch instructions. Specifically, I am > using an ALPHA cross compiler (gcc version 4.2, so I know it has > prefetching support), and the prefetch instructions that should be > generated are: lds, ldl, or ldq > > http://www.eecg.toronto.edu/~moshovos/ACA05/read/Performance%20tips%20for%20Alpha%20Linux%20C%20programmers.htm > > > My example program code snippet is: > > int main (int argc, char *argv[]) > { > > for (i = 0; i < 10000; i++){ > for (j = 0; j < 10000; j++){ > a[i][j] = b[j][0] + b[j+1][0]; > } > } > } > > The loops are large, and regular enough so the analysis pass should > determine that prefetching is possible. Would anyone know why the > instructions are not being generated, or if the objdump is not > capturing those prefetch instructions? > > As a separate note, I did try to use the gcc prefetch intrinsics, and > examined the objdump: > > __builtin_prefetch (&a[i+j], 1, 1); > 12000060c: 20 00 4f a0 .long 0xa04f0020 > 120000610: 1c 00 2f a0 .long 0xa02f001c > 120000614: 01 00 41 40 .long 0x40410001 > 120000618: 01 00 e1 43 .long 0x43e10001 > 12000061c: 42 16 20 40 .long 0x40201642 > 120000620: 30 00 2f 20 lda t0,48(fp) > 120000624: 01 04 22 40 .long 0x40220401 > 120000628: 00 00 e1 8b .long 0x8be10000 > __builtin_prefetch (&b[i+j], 0, 1); > 12000062c: 20 00 4f a0 .long 0xa04f0020 > 120000630: 1c 00 2f a0 .long 0xa02f001c > 120000634: 01 00 41 40 .long 0x40410001 > 120000638: 01 00 e1 43 .long 0x43e10001 > 12000063c: 42 16 20 40 .long 0x40201642 > 120000640: 70 1f 2f 20 lda t0,8048(fp) > 120000644: 01 04 22 40 .long 0x40220401 > 120000648: 00 00 e1 a3 .long 0xa3e10000 > > In this case, it seems that the compiler is generating a different set > of instructions for the prefetch instrinsic, and not using what the > alpha manual says. > > Thanks, > > Malek