rikardojala <rikardojala@xxxxxxxxx> writes: > I am writing a (32-bit)program in which a very time-critical task is to get > the position of the least significant bit of a long long (64-bit) integer, > and clearing it. The code from something like long long foo (unsigned long long ll) { unsigned long hi, lo; int i; hi = (ll >> 32) & 0xffffffffLLU; lo = ll & 0xffffffffLLU; i = __builtin_ffsl (lo); if (i != 0) lo &= ~ (1LU << i); else { i = __builtin_ffsl (hi); if (i != 0) hi &= ~ (1LU << i); } return ((unsigned long long) hi << 32) | (unsigned long long) lo; } is not too bad, provided you compile with -march=i686 so that gcc knows that it can use the bsfl instruction. > I have the following assembler macro utilizing the bsf > instruction to find the lsb : > > #define get_lsb_and_clear(__input_bb,__output_sq) \ > asm volatile("xorl %%edx, %%edx;"\ > "xorl %0, %0;"\ > "incl %%edx;"\ > "bsfl (%%ebx),%%ecx;"\ //Check for lsb in lower 32-bit dword > "jnz 1f;"\ > "leal 4(%%ebx),%%ebx;"\ //If not found, Step to the upper 32-bit dword > "bsfl (%%ebx),%%ecx;"\ //Check for lsb in upper 32-bit dword > "xorl $32,%0;"\ > "1: shll %%cl,%%edx;"\ > "xorl %%edx,(%%ebx);"\ > "xorl %%ecx,%0;": "=&r" (__output_sq):"b" > (&__input_bb):"%edx","%ecx","memory") > > This works, but is rather slow. As I understand it, having "memory" in your > clobber-list is bad for performance as the compiler cannot make assumptions > as to the state of the memory over the asm, and since I am interested in > high performance I would like to give the compiler a good chance at > optimization. I have tried to remove this constraint by using "=m" > (__input_bb) as output instead (as I've seen done in some tutorials), but > then the address %1 cannot be changed to the upper 32-bits. How do I get > around this problem? Is there any way of changing the address when using > "=m"? Can I get rid of the "memory" at all? The easy approach would be to also pass &__input_bb as an output operand: "=m" (&__input_bb). That will tell gcc that the assembler code modifis __input_bb. You don't have to actually use the operand in your assembler code. Ian