Hello Ian, Thank you for the valuable inputs. >> Or are you saying that that the instruction >> addd $3, (r1,r0) >> may modify r1, and therefore must be executed before the push >> instruction? No, the sequence of the instructions is correct here. What is expected here is (r1,r0) should be loaded with an address value before being added by 3. I looked into the RTL dumps and found the original call sequence, where (r1,r0) is loaded with appropriate address value before adding 3, is valid up to IRA phase. The reload phase is optimizing out the important load instruction (movsi_long). Dump before 'postreload' i.e. at 'ira': =========================================================================== (insn 1613 2944 2945 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32]) (reg:SI 1 r1)) 61 {pushsi_internal} (nil)) (insn 2945 1613 1614 65 Convert.c:707 (set (reg/f:SI 0 r0 [1056]) (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil)) (insn 1614 2945 1615 65 Convert.c:707 (set (reg/f:SI 0 r0 [1056]) (plus:SI (reg/f:SI 0 r0 [1056]) (const_int 3 [0x3]))) 1 {addsi3} (nil)) Dump of 'postreload': =========================================================================== (insn 1613 2944 1614 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32]) (reg:SI 1 r1)) 61 {pushsi_internal} (nil)) (insn 1614 1613 1615 65 Convert.c:707 (set (reg/f:SI 0 r0 [1056]) (plus:SI (reg/f:SI 0 r0 [1056]) (const_int 3 [0x3]))) 1 {addsi3} (nil)) The instruction 2945 is deleted during reload phase. This instruction was loading an address into register pair (r1,r0). And next instruction 1614, adds an offset value 3 into this address. However, as 2945 is deleted, (r1,r0) does not receive the expected address value before 3 is added into it by 1614. Just for the reference, giving here again the inappropriate assembly code generated: addd (sp), (r2,r1) push $2,r1 <<<<<<<<<<<<<<<<<================================== .LCFI548: Insn loading addr in (r1,r0) should be here addd $3, (r1,r0) <<<<<<<<<<<<<<<<<<<<========================== The push command in CR16 is capable of pushing eight adjacent registers onto the stack. The count value in the instruction specifies the number of words to save. So ACCUMULATE_OUTGOING_ARGS may not be efficient when multiple pushes are to be considered. I can not give the complete RTL dump files here. However 2 RTL dumps having part of the concerned code are attached with this mail. Thanks and Regards, Jayant Sonar [KPIT Cummins, Pune] -----Original Message----- From: Ian Lance Taylor [mailto:iant@xxxxxxxxxx] Sent: Saturday, November 19, 2011 10:35 PM To: Jayant R. Sonar Cc: gcc-help@xxxxxxxxxxx Subject: Re: [CR16] Improper register allocation "Jayant R. Sonar" <Jayant.Sonar@xxxxxxxxxxxxxxx> writes: > I am working on CR16 port. > > Found an improper code is being generated for following part of the test > code: > ============================================================================ > IeData = (void *) RAlloc(DataLen); > LAHType* p = (LAHType*)IeData; > > ReadFH (DpEntry, &FeData, &p->Table, &p->Masked, &p->NumAH, &p->Len, TData); > ============================================================================ > > Arguments are pushed on to the stack from right to left direction. > Possible working assembly that could be generated for this code: > Case 1: > ================================================================ > bal (ra), _RAlloc@c > stord (r1,r0), 32(sp) > addd $-2, (sp) > .LCFI547: > movd $18, (r7,r6) > addd (sp), (r7,r6) > push $2,r6 > .LCFI548: > addd $3, (r1,r0) > push $2,r0 > > OR lesser optimized code could be: > Case 2: > ============================================================ > bal (ra), _RAlloc@c > stord (r1,r0), 32(sp) > addd $-2, (sp) > .LCFI547: > movd $18, (r7,r6) > addd (sp), (r7,r6) > push $2,r6 > .LCFI548: > loadd 32(sp), (r1,r0) > addd $3, (r1,r0) > push $2,r0 > > However, I am getting following code: > ============================================================ > bal (ra), _RAlloc@c > stord (r1,r0), 20(sp) > addd $-2, (sp) > .LCFI547: > movd $18, (r2,r1) > addd (sp), (r2,r1) > push $2,r1 > .LCFI548: > addd $3, (r1,r0) <<============== PROBLEM > push $2,r0 > > 'R1' being the part of first push already has some value in it. So before > next push, either (r1,r0) should be loaded with new address value as shown > in Case 2. Otherwise, as in Case 1, different register pairs should be used > for these two push operations. > > CR16 port did not have REG_ALLOC_ORDER defined. > I am trying to control register allocation order using this. However, it > looks like this sequence has to be decided very carefully as slightest of > the alternation in sequence are causing some or other application failures. > > Is there any other way, I could address this problem? I don't know this syntax. Are you saying that the instruction push $2,r1 modifies r1? Or are you saying that that the instruction addd $3, (r1,r0) may modify r1, and therefore must be executed before the push instruction? Either way, REG_ALLOC_ORDER is not an issue here. You need to look at the RTL dumps. See whether the original call sequence in the first RTL dump is valid, using pseudo-registers. Then find out where it becomes invalid. One thing to consider is whether the push instruction is particularly efficient on your processor. If it is not--if it is comparable to a mov instruction to an offset from sp--then consider defining ACCUMULATE_OUTGOING_ARGS rather than PUSH_ARGS. Most processors get more efficient code from ACCUMULATE_OUTGOING_ARGS. Ian
(call_insn 1604 1602 1605 65 RcHeap.h:351 (parallel [ (set (reg:SI 0 r0) (call (mem:QI (symbol_ref:SI ("RAlloc") [flags 0x41] <function_decl 0xb7dfc980 RAlloc>) [0 S1 A8]) (const_int 0 [0x0]))) (clobber (reg:SI 14 ra)) ]) 96 {cr16_call_value_insn_branch} (nil) (expr_list:REG_DEP_TRUE (use (reg:HI 2 r2)) (expr_list:REG_DEP_TRUE (use (reg:HI 3 r3)) (expr_list:REG_DEP_TRUE (use (reg:SI 4 r4)) (nil))))) (insn 1605 1604 1607 65 RcHeap.h:351 (set (reg/f:SI 12 r12 [orig:525 D.18812 ] [525]) (reg:SI 0 r0)) 67 {*movsi_long} (nil)) (insn 1607 1605 1609 65 Convert.c:704 (set (mem/f/c/i:SI (plus:SI (reg/f:SI 15 sp) (const_int 20 [0x14])) [7 IeData+0 S4 A32]) (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil)) (debug_insn 1609 1607 1610 65 Convert.c:706 (var_location:SI p (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) -1 (nil)) (insn 1610 1609 2943 65 Convert.c:707 (set (reg/f:SI 15 sp) (plus:SI (reg/f:SI 15 sp) (const_int -2 [0xfffffffe]))) 1 {addsi3} (nil)) (insn 2943 1610 2944 65 Convert.c:707 (set (reg:SI 1 r1) (const_int 18 [0x12])) 67 {*movsi_long} (nil)) (insn 2944 2943 1613 65 Convert.c:707 (set (reg:SI 1 r1) (plus:SI (reg:SI 1 r1) (reg/f:SI 15 sp))) 1 {addsi3} (expr_list:REG_EQUIV (plus:SI (reg/f:SI 15 sp) (const_int 18 [0x12])) (nil))) (insn 1613 2944 2945 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32]) (reg:SI 1 r1)) 61 {pushsi_internal} (nil)) (insn 2945 1613 1614 65 Convert.c:707 (set (reg/f:SI 0 r0 [1056]) (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil)) (insn 1614 2945 1615 65 Convert.c:707 (set (reg/f:SI 0 r0 [1056]) (plus:SI (reg/f:SI 0 r0 [1056]) (const_int 3 [0x3]))) 1 {addsi3} (nil)) (insn 1615 1614 2946 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32]) (reg/f:SI 0 r0 [1056])) 61 {pushsi_internal} (nil)) (insn 2946 1615 1616 65 Convert.c:707 (set (reg/f:SI 0 r0 [1057]) (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil)) (insn 1616 2946 1617 65 Convert.c:707 (set (reg/f:SI 0 r0 [1057]) (plus:SI (reg/f:SI 0 r0 [1057]) (const_int 2 [0x2]))) 1 {addsi3} (nil)) (insn 1617 1616 2947 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32]) (reg/f:SI 0 r0 [1057])) 61 {pushsi_internal} (nil)) (insn 2947 1617 1618 65 Convert.c:707 (set (reg/f:SI 0 r0 [1058]) (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil)) (insn 1618 2947 1619 65 Convert.c:707 (set (reg/f:SI 0 r0 [1058]) (plus:SI (reg/f:SI 0 r0 [1058]) (const_int 1 [0x1]))) 1 {addsi3} (nil)) (insn 1619 1618 1620 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32]) (reg/f:SI 0 r0 [1058])) 61 {pushsi_internal} (nil)) (insn 1620 1619 1621 65 Convert.c:707 (set (reg/f:SI 15 sp) (plus:SI (reg/f:SI 15 sp) (const_int -10 [0xfffffff6]))) 1 {addsi3} (nil)) (insn 1621 1620 1624 65 Convert.c:707 (set (reg/f:SI 2 r2 [1059]) (reg/f:SI 15 sp)) 67 {*movsi_long} (nil)) (insn 1624 1621 1625 65 Convert.c:707 (set (reg:SI 0 r0 [1062]) (const_int 9 [0x9])) 67 {*movsi_long} (expr_list:REG_EQUIV (const_int 9 [0x9]) (nil))) (insn 1625 1624 1626 65 Convert.c:707 (set (mem/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32]) (reg:SI 0 r0 [1062])) 61 {pushsi_internal} (expr_list:REG_EQUAL (const_int 9 [0x9]) (nil))) (insn 1626 1625 1628 65 Convert.c:707 (set (reg:SI 4 r4) (reg/f:SI 8 r8 [1368])) 67 {*movsi_long} (expr_list:REG_EQUAL (plus:SI (reg/f:SI 13 r13) (const_int 28 [0x1c])) (nil))) (call_insn 1628 1626 1629 65 Convert.c:707 (parallel [ (set (reg:SI 0 r0) (call (mem:QI (symbol_ref:SI ("memcpy") [flags 0x41] <function_decl 0xb75a7600 memcpy>) [0 S1 A8]) (const_int 4 [0x4]))) (clobber (reg:SI 14 ra)) ]) 96 {cr16_call_value_insn_branch} (expr_list:REG_EH_REGION (const_int 0 [0x0]) (nil)) (expr_list:REG_DEP_TRUE (use (reg:SI 2 r2)) (expr_list:REG_DEP_TRUE (use (reg:SI 4 r4)) (nil)))) (insn 1629 1628 1632 65 Convert.c:707 (set (reg/f:SI 15 sp) (plus:SI (reg/f:SI 15 sp) (const_int 4 [0x4]))) 1 {addsi3} (nil)) (insn 1632 1629 1633 65 Convert.c:707 (set (reg:SI 4 r4) (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil)) (insn 1633 1632 1634 65 Convert.c:707 (set (reg:SI 2 r2) (reg/f:SI 10 r10 [1371])) 67 {*movsi_long} (expr_list:REG_EQUAL (plus:SI (reg/f:SI 13 r13) (const_int 2 [0x2])) (nil))) (call_insn 1634 1633 2948 65 Convert.c:707 (parallel [ (call (mem:QI (symbol_ref:SI ("ReadFH") [flags 0x41] <function_decl 0xb7852000 ReadFH>) [0 S1 A8]) (const_int 28 [0x1c])) (clobber (reg:SI 14 ra)) ]) 93 {cr16_call_insn_branch} (nil) (expr_list:REG_DEP_TRUE (use (reg:SI 2 r2)) (expr_list:REG_DEP_TRUE (use (reg:SI 4 r4)) (nil))))
(call_insn 1604 1602 1605 65 RcHeap.h:351 (parallel [ (set (reg:SI 0 r0) (call (mem:QI (symbol_ref:SI ("RAlloc") [flags 0x41] <function_decl 0xb7dfc980 RAlloc>) [0 S1 A8]) (const_int 0 [0x0]))) (clobber (reg:SI 14 ra)) ]) 96 {cr16_call_value_insn_branch} (nil) (expr_list:REG_DEP_TRUE (use (reg:HI 2 r2)) (expr_list:REG_DEP_TRUE (use (reg:HI 3 r3)) (expr_list:REG_DEP_TRUE (use (reg:SI 4 r4)) (nil))))) (insn 1605 1604 1607 65 RcHeap.h:351 (set (reg/f:SI 12 r12 [orig:525 D.18812 ] [525]) (reg:SI 0 r0)) 67 {*movsi_long} (nil)) (insn 1607 1605 1609 65 Convert.c:704 (set (mem/f/c/i:SI (plus:SI (reg/f:SI 15 sp) (const_int 20 [0x14])) [7 IeData+0 S4 A32]) (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil)) (debug_insn 1609 1607 1610 65 Convert.c:706 (var_location:SI p (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) -1 (nil)) (insn 1610 1609 2943 65 Convert.c:707 (set (reg/f:SI 15 sp) (plus:SI (reg/f:SI 15 sp) (const_int -2 [0xfffffffe]))) 1 {addsi3} (nil)) (insn 2943 1610 2944 65 Convert.c:707 (set (reg:SI 1 r1) (const_int 18 [0x12])) 67 {*movsi_long} (nil)) (insn 2944 2943 1613 65 Convert.c:707 (set (reg:SI 1 r1) (plus:SI (reg:SI 1 r1) (reg/f:SI 15 sp))) 1 {addsi3} (expr_list:REG_EQUIV (plus:SI (reg/f:SI 15 sp) (const_int 18 [0x12])) (nil))) (insn 1613 2944 1614 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32]) (reg:SI 1 r1)) 61 {pushsi_internal} (nil)) (insn 1614 1613 1615 65 Convert.c:707 (set (reg/f:SI 0 r0 [1056]) (plus:SI (reg/f:SI 0 r0 [1056]) (const_int 3 [0x3]))) 1 {addsi3} (nil)) (insn 1615 1614 1616 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32]) (reg/f:SI 0 r0 [1056])) 61 {pushsi_internal} (nil)) (insn 1616 1615 1617 65 Convert.c:707 (set (reg/f:SI 0 r0 [1057]) (plus:SI (reg/f:SI 0 r0 [1057]) (const_int -1 [0xffffffff]))) 1 {addsi3} (nil)) (insn 1617 1616 1618 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32]) (reg/f:SI 0 r0 [1057])) 61 {pushsi_internal} (nil)) (insn 1618 1617 1619 65 Convert.c:707 (set (reg/f:SI 0 r0 [1058]) (plus:SI (reg/f:SI 0 r0 [1058]) (const_int -1 [0xffffffff]))) 1 {addsi3} (nil)) (insn 1619 1618 1620 65 Convert.c:707 (set (mem/f/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32]) (reg/f:SI 0 r0 [1058])) 61 {pushsi_internal} (nil)) (insn 1620 1619 1621 65 Convert.c:707 (set (reg/f:SI 15 sp) (plus:SI (reg/f:SI 15 sp) (const_int -10 [0xfffffff6]))) 1 {addsi3} (nil)) (insn 1621 1620 1624 65 Convert.c:707 (set (reg/f:SI 2 r2 [1059]) (reg/f:SI 15 sp)) 67 {*movsi_long} (nil)) (insn 1624 1621 1625 65 Convert.c:707 (set (reg:SI 0 r0 [1062]) (const_int 9 [0x9])) 67 {*movsi_long} (expr_list:REG_EQUIV (const_int 9 [0x9]) (nil))) (insn 1625 1624 1626 65 Convert.c:707 (set (mem/i:SI (pre_dec:SI (reg/f:SI 15 sp)) [0 S4 A32]) (reg:SI 0 r0 [1062])) 61 {pushsi_internal} (expr_list:REG_EQUAL (const_int 9 [0x9]) (nil))) (insn 1626 1625 1628 65 Convert.c:707 (set (reg:SI 4 r4) (reg/f:SI 8 r8 [1368])) 67 {*movsi_long} (expr_list:REG_EQUAL (plus:SI (reg/f:SI 13 r13) (const_int 28 [0x1c])) (nil))) (call_insn 1628 1626 1629 65 Convert.c:707 (parallel [ (set (reg:SI 0 r0) (call (mem:QI (symbol_ref:SI ("memcpy") [flags 0x41] <function_decl 0xb75a7600 memcpy>) [0 S1 A8]) (const_int 4 [0x4]))) (clobber (reg:SI 14 ra)) ]) 96 {cr16_call_value_insn_branch} (expr_list:REG_EH_REGION (const_int 0 [0x0]) (nil)) (expr_list:REG_DEP_TRUE (use (reg:SI 2 r2)) (expr_list:REG_DEP_TRUE (use (reg:SI 4 r4)) (nil)))) (insn 1629 1628 1632 65 Convert.c:707 (set (reg/f:SI 15 sp) (plus:SI (reg/f:SI 15 sp) (const_int 4 [0x4]))) 1 {addsi3} (nil)) (insn 1632 1629 1633 65 Convert.c:707 (set (reg:SI 4 r4) (reg/f:SI 12 r12 [orig:525 D.18812 ] [525])) 67 {*movsi_long} (nil)) (insn 1633 1632 1634 65 Convert.c:707 (set (reg:SI 2 r2) (reg/f:SI 10 r10 [1371])) 67 {*movsi_long} (expr_list:REG_EQUAL (plus:SI (reg/f:SI 13 r13) (const_int 2 [0x2])) (nil))) (call_insn 1634 1633 2948 65 Convert.c:707 (parallel [ (call (mem:QI (symbol_ref:SI ("LasReadFieldAttachedHandsets") [flags 0x41] <function_decl 0xb7852000 LasReadFieldAttachedHandsets>) [0 S1 A8]) (const_int 28 [0x1c])) (clobber (reg:SI 14 ra)) ]) 93 {cr16_call_insn_branch} (nil) (expr_list:REG_DEP_TRUE (use (reg:SI 2 r2)) (expr_list:REG_DEP_TRUE (use (reg:SI 4 r4)) (nil))))