Hi. I am writing a (32-bit)program in which a very time-critical task is to get the position of the least significant bit of a long long (64-bit) integer, and clearing it. I have the following assembler macro utilizing the bsf instruction to find the lsb : #define get_lsb_and_clear(__input_bb,__output_sq) \ asm volatile("xorl %%edx, %%edx;"\ "xorl %0, %0;"\ "incl %%edx;"\ "bsfl (%%ebx),%%ecx;"\ //Check for lsb in lower 32-bit dword "jnz 1f;"\ "leal 4(%%ebx),%%ebx;"\ //If not found, Step to the upper 32-bit dword "bsfl (%%ebx),%%ecx;"\ //Check for lsb in upper 32-bit dword "xorl $32,%0;"\ "1: shll %%cl,%%edx;"\ "xorl %%edx,(%%ebx);"\ "xorl %%ecx,%0;": "=&r" (__output_sq):"b" (&__input_bb):"%edx","%ecx","memory") This works, but is rather slow. As I understand it, having "memory" in your clobber-list is bad for performance as the compiler cannot make assumptions as to the state of the memory over the asm, and since I am interested in high performance I would like to give the compiler a good chance at optimization. I have tried to remove this constraint by using "=m" (__input_bb) as output instead (as I've seen done in some tutorials), but then the address %1 cannot be changed to the upper 32-bits. How do I get around this problem? Is there any way of changing the address when using "=m"? Can I get rid of the "memory" at all? Thanks for any help. Rikard Ojala -- View this message in context: http://www.nabble.com/Problem-with-GNU-C%2B%2B-Inline-Assembler-tp22815605p22815605.html Sent from the gcc - Help mailing list archive at Nabble.com.