[PATCH -perfbook v2 2/3] memorder: Add info on recent x86 implementation of smp_mb()

Akira Yokosawa <akiyks@xxxxxxxxx> · Sat, 14 Oct 2023 17:42:16 +0900

From: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx>

smp_mb() uses lock;add for x86 in the linux kernel. Add information
about the same.

Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
Co-developed-by: Akira Yokosawa <akiyks@xxxxxxxxx>
Signed-off-by: Akira Yokosawa <akiyks@xxxxxxxxx>
---
Changes in v2 (by akiyks):
  - Apply punctuation conventions of perfbook LaTeX source.
      - Break lines at sentence-ending punctuation marks.
  - Overall wordsmith.
      - Fix typo in Subject. (implementation)
      - Drop confusing "the"s.
      - Use "lock;addl" for consistency in the section.
      - Reworded "instead of directly modifying SP" which surprised
        me a bit.
      - Reorder the final sentence to make it obvious that mb() is the
        one who uses mfence.
---
 memorder/memorder.tex | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/memorder/memorder.tex b/memorder/memorder.tex
index 5c978fbef172..6b9c3268e589 100644
--- a/memorder/memorder.tex
+++ b/memorder/memorder.tex
@@ -6081,6 +6081,16 @@ A few older variants of the x86 CPU have a mode bit that enables out-of-order
 stores, and for these CPUs, \co{smp_wmb()} must also be defined to
 be \co{lock;addl}.
 
+A 2017 kernel commit by Michael S.~Tsirkin replaced \co{mfence} with
+\co{lock;addl} in \co{smp_mb()}, achieving a 60 percent performance
+boost~\cite{Tsirkin2017}.
+The change used a 4-byte negative offset from \co{SP} to avoid
+slowness due to false data dependencies, instead of directly
+accessing memory pointed to by \co{SP}.
+\co{clflush} users still need to use \co{mfence} for ordering.
+Therefore, they were converted to use \co{mb()}, which uses \co{mfence}
+as before, instead of \co{smp_mb()}.
+
 Although newer x86 implementations accommodate self-modifying code
 without any special instructions, to be fully compatible with
 past and potential future x86 implementations, a given CPU must
-- 
2.25.1