Re: Problem with GNU C++ Inline Assembler

Ian Lance Taylor <iant@xxxxxxxxxx> · Tue, 31 Mar 2009 15:53:28 -0700

rikardojala <rikardojala@xxxxxxxxx> writes:

> I am writing a (32-bit)program in which a very time-critical task is to get
> the position of the least significant bit of a long long (64-bit) integer,
> and clearing it.

The code from something like 

long long
foo (unsigned long long ll)
{
  unsigned long hi, lo;
  int i;

  hi = (ll >> 32) & 0xffffffffLLU;
  lo = ll & 0xffffffffLLU;
  i = __builtin_ffsl (lo);
  if (i != 0)
    lo &= ~ (1LU << i);
  else
    {
      i = __builtin_ffsl (hi);
      if (i != 0)
	hi &= ~ (1LU << i);
    }
  return ((unsigned long long) hi << 32) | (unsigned long long) lo;
}

is not too bad, provided you compile with -march=i686 so that gcc knows
that it can use the bsfl instruction.

> I have the following assembler macro utilizing the bsf
> instruction to find the lsb :
>
> #define get_lsb_and_clear(__input_bb,__output_sq) \
> 	asm volatile("xorl %%edx, %%edx;"\
> 			"xorl %0, %0;"\
> 			"incl %%edx;"\
> 			"bsfl (%%ebx),%%ecx;"\    //Check for lsb in lower 32-bit dword
> 			"jnz 1f;"\
> 			"leal 4(%%ebx),%%ebx;"\  //If not found, Step to the upper 32-bit dword
> 			"bsfl (%%ebx),%%ecx;"\   //Check for lsb in upper 32-bit dword
> 			"xorl $32,%0;"\
> 			"1: shll %%cl,%%edx;"\
> 			"xorl %%edx,(%%ebx);"\
> 			"xorl %%ecx,%0;": "=&r" (__output_sq):"b"
> (&__input_bb):"%edx","%ecx","memory")
>
> This works, but is rather slow. As I understand it, having "memory" in your
> clobber-list is bad for performance as the compiler cannot make assumptions
> as to the state of the memory over the asm, and since I am interested in
> high performance I would like to give the compiler a good chance at
> optimization. I have tried to remove this constraint by using "=m"
> (__input_bb) as output instead (as I've seen done in some tutorials), but
> then the address %1 cannot be changed to the upper 32-bits. How do I get
> around this problem? Is there any way of changing the address when using
> "=m"? Can I get rid of the "memory" at all?

The easy approach would be to also pass &__input_bb as an output
operand: "=m" (&__input_bb).  That will tell gcc that the assembler code
modifis __input_bb.  You don't have to actually use the operand in your
assembler code.

Ian