I'm trying to time the 'pshufb' instruction on core2, so have written the following code and compiled with -O3 long rdtsc(void) { long x,y; asm volatile("rdtsc" : "=a"(x), "=d"(y)); return (y<<32)+x; } int main(void) { __m128i x,y,x1,y1; for (int rounds=3; rounds<1000; rounds+=rounds) { long t0 = rdtsc(); x = _mm_set_epi8(1,3,5,7,9,11,13,15,0,2,4,6,8,10,12,14 ); y = _mm_set_epi8 (31,41,59,26,53,58,97,93,23,84,62,64,33,83,27,95); int wurzel = rounds; asm volatile("movaps (%[left]),%%xmm0\nmovaps (%[right]),%% xmm1\nU:\npshufb %%xmm0,%%xmm1\nsub $1,%[rounds]\njne U\nmovaps %%xmm0, (%[left1])\nmovaps %%xmm1,(%[right1])\n" : : [left] "r" (&x), [right] "r" (&y), [left1] "r" (&x1), [right1] "r" (&y1), [rounds] "r" (wurzel)); long t1 = rdtsc(); cout << wurzel << " " << rounds << " " << t1-t0 << endl; } } The problem is that the values of both 'rounds' and 'wurzel' are zero at the end of the call. How do I tell the machine that the register called '[rounds]' will be changed by the assembler section, and so that it can't just use the name that 'wurzel' happened to be stored in at the time? If I add an output constraint '[rounds] "=r" (wurzel)' and change the last input constraint to '"0" (rounds)' then the variable 'rounds' doesn't change and the code works, but this doesn't feel like the right answer. You'll also notice that the code for the loading of X and Y is a kludge; obviously I'd prefer '[left] "SSE_register" (x)'; is there an asm constraint that means 'SSE register'? Thanks in advance for your help, Tom