AW: AW: setmemsi, movmemsi and post_inc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



richt-----
> Von: Jeff Law <jeffreyalaw@xxxxxxxxx>
> Gesendet: Donnerstag, 25. März 2021 16:04
> On 3/25/2021 8:50 AM, Stefan Franke wrote:
> >> On 3/25/2021 8:21 AM, Stefan Franke wrote:
> >>> Hi there,
> >>>
> >>> I consider implementing movmemsi/setmemsi for some arch using
> >>> post_inc. Is there a "best practice" for using auto increments in
> >>> such early stages to avoid hickups in cse, gcse, cprop etc.p.p.?
> >> IIRC best practice is not to expose auto-inc until the auto-inc pass
> >> as earlier passes don't know how to deal with them.  See "Incdec" in
> >> the developer manual.
> >>
> >>
> >> I'm also not aware of a target where an autoinc happens in contexts
> >> other than in a MEM.  So you may run into problems with things that
> >> look like a simple reg->reg move -- insns 8 and 9 in your example.
> >>
> >>
> >> jeff
> > You looked closely! I added the reg note to the register move to enhance
> some passes to handle this correctly.
> > e.g. in cse.c
> >
> >    if (find_reg_note (insn, REG_INC, dest))
> >      continue;
> >
> >   But these modifications aren't the standard way, thus I asked 😊
> >
> > => The routines should emit mems with offset and pray that auto-inc will
> pick it up?
> 
> Yes.
> 
> 
> > (Btw: auto-inc-dec does not work well for unrolled loops, so I'm
> > tempted to force the auto-inc stuff...)
> 
> That's likely going to lead to a variety of problems.   The (documented)
> restriction around auto-inc not being used early in the pipeline has been
> around at least 30 years and passes have been written with that
> assumption.  Fixing all of them may be a substantial effort.
> 
> It's probably a better use of your time to get a deep understanding of why
> you're not getting the code you want in the presence of unrolling -- there
> may be things we can do in the unroller, auto-inc or passes in the middle to
> improve that.
> 
> Jeff

At least it seems possible to use auto_inc inside an emitted loop, since that yields a separate bb...

Loop unrolling and auto_inc (post_inc) does not play well since there are two issues. consider these mem refs, with mode size 4:
  a[0] = ...
  a[4] = ...
  a[8] = ...
  a[12] = ...

loop unrolling does something like
  b = a
  b[0] = ...
  b[4] = ...
  b[8] = ...
  b[12] = ...
  b = b + 16
  b[0] = ...
  ...

1. cse folds the memory refs from b to a, and but not the [4]
  b = a
  a[0] = ...
  b[4] = ...
  a[8] = ...
  a[12] = ...
...
And you end up with one post_inc in the beginning and the rest without.

My workaround here is to consider the DF_REG_USE_COUNT and DF_REG_DEF_COUNT to decide if b[4] should be folded too
     if (DF_REG_USE_COUNT(REGNO(folded_arg0)) <= 2 || DF_REG_DEF_COUNT(REGNO(folded_arg0)) > 1)
    break;

2. auto-inc-dec does not yet handle the form of mem refs with offset and a matching add after these.
Since the above pattern as insns looks like
  b = a + x
  *b = ...
  a = a + x + 4
and is detected as PRE_ADD, I convert it into
  a = a + x
  *a = ...
  a = a + 4
which is now a POST_INC, update the variables and auto-inc-dec generates post increments up to the top, where x gets zero.

=> there is room for improvements^^

Stefan








[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux