On Tue, Aug 28, 2007 at 11:02:49PM +0100, Darryl Miles wrote: > Thanks for the note on the peephole, can the peephole substitute > sequences when there is overlapping lifetimes of various processor > features. For example the 'flags' bits, you can't peephole a sequence > that does a compare (setting flag bits) then loads a register with zero > (not affecting flag bits) then does a branch based on flag bits, > replacing the loads a register with zero with 'xor' on i386 would > destroy the flags. Peephole definitions check for cases like this and won't do the optimization clobbering the flags register if the flags register is live at that point. > 0000000000000090 <u64_divide>: > 00: 49 89 d1 mov %rdx,%r9 <<- [1] save %rdx in > %r9 for arg-as-return > 03: 48 8b 07 mov (%rdi),%rax > 06: ?? ?? ?? xor %edx,%edx <<- implicit zero of > high 32bits, would accept xorq %rdx,%rdx Right, that's why I suggest using "gcc -S -dp" because then it clearly shows if it's a 32-bit (*movsi_xxx) or a 64-bit (*movdi_xxx) instruction (as seen from GCC's point of view, since the actual CPU instruction is the same in this and several other cases). > 09: ?? ?? xor %r8d,%r8d Likewise. > 0b: 48 f7 36 divq (%rsi) > 0e: 73 02 jae 12 <u64_divide+0x12> > 10: ?? ?? inc %r8d Can't you substitute the "jae; inc %r8d" sequence with "adcl $0, %r8d"? > 12: 49 89 01 mov %rax,(%r9) <<- [1] use saved > %rdx to return argument > 15: 48 89 11 mov %rdx,(%rcx) > 18: ?? ?? mov %r8d,%eax > 1a: c3 retq > I also did not say which version of GCC I was using, it was 4.0.2, but > I've just tried with 4.2.1 and the same code is generated, although -O6 > appears to try and inline things further which lead me to find an > invalid constraint "g" ((*divisor)) should be "r" ((*divisor)). Since > it tried to use a constant, although a register or memory via indirected > register is valid here. You can use "rm" for such a constraint. > Another concern that occurs to me is that if the __asm__ constraints are > not 100% perfect is there anyway to test/permutate every possible way > for the compiler might generate the code. I suppose you could write a script which outputs "calls" to the asm construct with a constant, local variable (which we assume will end up in a register) or global variable for each operand in turn, then try compiling and assembling (i.e. -c) the resulting code. -- Rask Ingemann Lambertsen