x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?

Dexuan Cui <decui@xxxxxxxxxxxxx> · Thu, 3 Mar 2016 14:33:15 +0000

Hi,
My understanding about arch/x86/include/asm/barrier.h is: obviously Linux
more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32 platforms that
don't support XMM2.

However, it looks people say Locked Add is much faster than the FENCE
instructions, even on modern Intel CPUs like Haswell, e.g., please see
the three sources:

" 11.5.1 Locked Instructions as Memory Barriers
Optimization
Use locked instructions to implement Store/Store and Store/Load barriers.
"
http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf

"lock addl %(rsp), 0 is a better solution for StoreLoad barrier ":
http://shipilev.net/blog/2014/on-the-fence-with-dependencies/

"...locked instruction are more efficient barriers...":
http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/

I also found that FreeBSD prefers Locked Add.

So, I'm curious why Linux prefers MFENCE.
I guess I may be missing something.

I tried to google the question, but didn't find an answer.

Thanks,
-- Dexuan

--
To unsubscribe from this list: send the line "unsubscribe linux-x86_64" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html