richt----- > Von: Jeff Law <jeffreyalaw@xxxxxxxxx> > Gesendet: Donnerstag, 25. März 2021 16:04 > On 3/25/2021 8:50 AM, Stefan Franke wrote: > >> On 3/25/2021 8:21 AM, Stefan Franke wrote: > >>> Hi there, > >>> > >>> I consider implementing movmemsi/setmemsi for some arch using > >>> post_inc. Is there a "best practice" for using auto increments in > >>> such early stages to avoid hickups in cse, gcse, cprop etc.p.p.? > >> IIRC best practice is not to expose auto-inc until the auto-inc pass > >> as earlier passes don't know how to deal with them. See "Incdec" in > >> the developer manual. > >> > >> > >> I'm also not aware of a target where an autoinc happens in contexts > >> other than in a MEM. So you may run into problems with things that > >> look like a simple reg->reg move -- insns 8 and 9 in your example. > >> > >> > >> jeff > > You looked closely! I added the reg note to the register move to enhance > some passes to handle this correctly. > > e.g. in cse.c > > > > if (find_reg_note (insn, REG_INC, dest)) > > continue; > > > > But these modifications aren't the standard way, thus I asked 😊 > > > > => The routines should emit mems with offset and pray that auto-inc will > pick it up? > > Yes. > > > > (Btw: auto-inc-dec does not work well for unrolled loops, so I'm > > tempted to force the auto-inc stuff...) > > That's likely going to lead to a variety of problems. The (documented) > restriction around auto-inc not being used early in the pipeline has been > around at least 30 years and passes have been written with that > assumption. Fixing all of them may be a substantial effort. > > It's probably a better use of your time to get a deep understanding of why > you're not getting the code you want in the presence of unrolling -- there > may be things we can do in the unroller, auto-inc or passes in the middle to > improve that. > > Jeff At least it seems possible to use auto_inc inside an emitted loop, since that yields a separate bb... Loop unrolling and auto_inc (post_inc) does not play well since there are two issues. consider these mem refs, with mode size 4: a[0] = ... a[4] = ... a[8] = ... a[12] = ... loop unrolling does something like b = a b[0] = ... b[4] = ... b[8] = ... b[12] = ... b = b + 16 b[0] = ... ... 1. cse folds the memory refs from b to a, and but not the [4] b = a a[0] = ... b[4] = ... a[8] = ... a[12] = ... ... And you end up with one post_inc in the beginning and the rest without. My workaround here is to consider the DF_REG_USE_COUNT and DF_REG_DEF_COUNT to decide if b[4] should be folded too if (DF_REG_USE_COUNT(REGNO(folded_arg0)) <= 2 || DF_REG_DEF_COUNT(REGNO(folded_arg0)) > 1) break; 2. auto-inc-dec does not yet handle the form of mem refs with offset and a matching add after these. Since the above pattern as insns looks like b = a + x *b = ... a = a + x + 4 and is detected as PRE_ADD, I convert it into a = a + x *a = ... a = a + 4 which is now a POST_INC, update the variables and auto-inc-dec generates post increments up to the top, where x gets zero. => there is room for improvements^^ Stefan