On Thu, 17 Mar 2011, Eric Dumazet wrote: > > > I wonder why we dont use : > > > > > > addl $0x2,%fs:xt_u64 > > > addcl $0x0,%fs:xt_u64+4 > > > > The compiler is fed the following > > > > *__this_cpu_ptr(xt_u64) += 2 > > > > __this_cpu_ptr makes it: > > > > *(xt_u64 + __my_cpu_offset) += 2 > > > > So the compiler calculates the address first and then increments it. > > > > The compiler could optimize this I think. Wonder why that does not happen. > > Compiler is really forced to compute addr, thats why. > > Hmm, we should not fallback to generic ops I think, but tweak > > percpu_add_op() { percpu_add_op() is not used. This is a 64 bit operation on a 32 bit machine thus we fall back to this_cpu_generic_to_op() #define __this_cpu_generic_to_op(pcp, val, op) \ do { \ *__this_cpu_ptr(&(pcp)) op val; \ } while (0) -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html