Re: Loop optimizations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2011/7/19 Ian Lance Taylor <iant@xxxxxxxxxx>:
> Aurelien Buhrig <aurelien.buhrig.gcc@xxxxxxxxx> writes:
>
>> Is there some resources explaining how GCC generates the first RTL
>> from Gimple, why some alternatives fail, and what should be done to
>> get what we expect ?
>
> Nothing very specific, no.
>
>> For instance, a concrete example with this code:
>>
>> uint32_t M1[16], M2[16];
>>
>> int i;
>> for (i=0, i<16; i++)
>>  M1[i] = M2[i];
>>
>> The generated code is quite horrible... GCC is messed up with
>> SI/PSI/HI trunc and extend, and keeps 2 variables, 1 incremented by 1
>> and shifted 2 for addresses, 1 decremented from 16 to 0 for the
>> loop...
>> I would like GCC to do what it does before I extend Pmode, i.e. extend
>> "i" from HI to PSI, use it to address M1 and M2, increment by
>> sizeof(uint32_t), and test if i != 16*sizeof(uint32_t)...
>>
>> So is this problem related to md file?
>> What should I look at to find such optimization problems?
>
> I don't see offhand why this would be related to RTL expansion.  Look at
> the dump files.  What does the code look like in the last tree dump?
> Look at the dump files for the loop passes in particular; what kind of
> optimizations do they see or fail to see?
>

last tree dump (final_cleanup) looks like:
<bb 3>:
  M2[i] = M1[i];
  i = i + 1;
  ivtmp.6 = ivtmp.6 - 1;
  if (ivtmp.6 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 4>:

Whereas when Pmode was HImode, It looked like:
<bb 3>:
  MEM[base: &M2 + ivtmp.13] = MEM[base: &M1 + ivtmp.13];
  ivtmp.13 = ivtmp.13 + 2;
  if (ivtmp.13 != 32)
    goto <bb 3>;
  else
    goto <bb 4>;
<bb 4>:

which enable better code generation...

The first differences between both versions appear from induction
variable optimisations pass (ivopts).
The new port with Pmode=PSImode dump is

;; Function main (main)

main ()
{
  unsigned int ivtmp.6;
  unsigned int i;
  long unsigned int D.1223;

<bb 2>:

<bb 3>:
  # ivtmp.6_1 = PHI <ivtmp.6_9(4), 16(2)>
  # i_15 = PHI <i_6(4), 0(2)>
  D.1223_5 = M1[i_15];
  M2[i_15] = D.1223_5;
  i_6 = i_15 + 1;
  ivtmp.6_9 = ivtmp.6_1 - 1;
  if (ivtmp.6_9 != 0)
    goto <bb 4>;
  else
    goto <bb 5>;

<bb 4>:
  goto <bb 3>;

<bb 5>:
  return 0;

}



while the old port with PMode=HImode is:

;; Function main (main)

main ()
{
  uint32_t[16] * D.1243;
  uint32_t[16] * D.1242;
  unsigned int ivtmp.13;
  unsigned int ivtmp.6;
  unsigned int i;
  long unsigned int D.1223;

<bb 2>:

<bb 3>:
  # ivtmp.13_14 = PHI <ivtmp.13_13(4), 0(2)>
  D.1242_17 = &M1 + ivtmp.13_14;
  D.1223_5 = MEM[base: D.1242_17];
  D.1243_18 = &M2 + ivtmp.13_14;
  MEM[base: D.1243_18] = D.1223_5;
  ivtmp.13_13 = ivtmp.13_14 + 4;
  if (ivtmp.13_13 != 64)
    goto <bb 4>;
  else
    goto <bb 5>;

<bb 4>:
  goto <bb 3>;

<bb 5>:
  return 0;

}


Perhaps GCC is not able to map
D.1242_17 = &M1 + ivtmp.13_14; (line from old port)
because $M1 and ivtmp.13_14 are not HImode anymore (but PSI and HI) ?

Is there something I can do to optimize this ?
And generally speaking, is the tree generation influenced by the RTL
description in any way ?

Thanks!
Aurélien



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux