On Thu, Jan 5, 2012 at 11:15 AM, Christoph Lameter <cl@xxxxxxxxx> wrote: > > XADD and ADD have the same cycle count if the ADD is used to add to a > memory location. Both use 4 microops. Christ, stop counting uops. They are some random internal microarchitectural detail, and ignores things like processor uop scheduling and decoding issues. The thing that matters is (a) performance and (b) code sanity. The "local_cpu_return()" operations fail *seriously* in the code sanity department. They are a fundamentally insane operation. And nothing you have said says "it's a huge performance win" to me. First off, there aren't even very many users, and the users there are seem to not even be all that performance-critical. For statistics, just regular "add" is the normal thing to do. The add_return thing is insane, for all the reasons I already outlined. It *fundamentally* doesn't have any sane semantics. Just remove it. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html