> Assembly generated by bfin port (GCC 4.4.0 --target=bfin-elf) (-Os): > > -------------------------------------------------------------------- > > P1 = 41 (X); > > LSETUP (.L3, .L6) LC1 = P1; <---- (1) > > jump.s .L2; <---- (2) > > .L3: > > R2 = [P0++]; > > R3 = [P5++]; > > R2 *= R3; > > R0 += 1; > > R1 = R1 + R2; > > .L2: > > R2 = P2; > > .L6: > > P2 += 1; > > > > > > Here, marked instruction (2) "jump.s" is wrongly generated along with > > hardware loop. > > > ... what exactly you believe is wrong with this. LSETUP (.L3, .L6) LC1 = P1; <---- (1) jump.s .L2; <---- (2) .L3: R2 = [P0++]; R3 = [P5++]; R2 *= R3; R0 += 1; R1 = R1 + R2; .L2: R2 = P2; <---- (3) .L6: P2 += 1; <---- (4) As per my understanding, when the control reaches within the loop i.e. marked instruction (2), the control get transfered to the label .L2. Then, it will execute the instructions (3) and (4). After executing these instructions, control will again reach to first instruction of loop i.e. (2). The above will be repeated for all iterations of loop. Thus, the instructions between labels .L3 and .L2 will never get executed. Please verify my understanding. On Mon, 2009-10-19 at 16:19 +0100, Bernd Schmidt wrote: > > Consider the following test case: > > > > Test Case Reference: > > -------------------- > > > > int siVect[40] ; > > int siCoeff[40] ; > > int siSumofDotProduct ; > > int siIndex1, siIndex2 ; > > > > vTestMultipleArrayAccessWithDifferentLoopIndex() > > { > > for (siIndex1=0 ; siIndex1<40 ; siIndex1++) > > { > > siSumofDotProduct += siVect[siIndex1] * siCoeff[siIndex2]; > > siIndex2++; > > } > > } > > This is not a complete testcase; it needs additional code to call this > function and test the result. When I added this, it produced the same > result with -O0, -O2 and -Os, hence I don't know... > > > > > Assembly generated by bfin port (GCC 4.4.0 --target=bfin-elf) (-Os): > > -------------------------------------------------------------------- > > P1 = 41 (X); > > LSETUP (.L3, .L6) LC1 = P1; <---- (1) > > jump.s .L2; <---- (2) > > .L3: > > R2 = [P0++]; > > R3 = [P5++]; > > R2 *= R3; > > R0 += 1; > > R1 = R1 + R2; > > .L2: > > R2 = P2; > > .L6: > > P2 += 1; > > > > > > Here, marked instruction (2) "jump.s" is wrongly generated along with > > hardware loop. > > ... what exactly you believe is wrong with this. > > > Bernd