On 3/25/2021 2:16 PM, Stefan Franke wrote:
At least it seems possible to use auto_inc inside an emitted loop, since that yields a separate bb...
I wouldn't rely on that.
Loop unrolling and auto_inc (post_inc) does not play well since there are two issues. consider these mem refs, with mode size 4: a[0] = ... a[4] = ... a[8] = ... a[12] = ... loop unrolling does something like b = a b[0] = ... b[4] = ... b[8] = ... b[12] = ... b = b + 16 b[0] = ... ... 1. cse folds the memory refs from b to a, and but not the [4] b = a a[0] = ... b[4] = ... a[8] = ... a[12] = ... ... And you end up with one post_inc in the beginning and the rest without.
So that argues that you want to fixup CSE and possibly your ports costing model.
My workaround here is to consider the DF_REG_USE_COUNT and DF_REG_DEF_COUNT to decide if b[4] should be folded too if (DF_REG_USE_COUNT(REGNO(folded_arg0)) <= 2 || DF_REG_DEF_COUNT(REGNO(folded_arg0)) > 1) break; 2. auto-inc-dec does not yet handle the form of mem refs with offset and a matching add after these. Since the above pattern as insns looks like b = a + x *b = ... a = a + x + 4 and is detected as PRE_ADD, I convert it into a = a + x *a = ... a = a + 4 which is now a POST_INC, update the variables and auto-inc-dec generates post increments up to the top, where x gets zero. => there is room for improvements^^
Yup, and that's where I'd focus my efforts. Emitting auto-increment addressing modes at a point where the compiler isn't expecting them is just asking for trouble. It may work today, it may work for a decade, but experience shows that if you break the rules it will break one day.
jeff