Hi Joel, On 2023/10/13 10:22, Joel Fernandes (Google) wrote: > smp_mb() uses lock;add for x86 in the linux kernel. Add information > about the same. > > Cc: paulmck@xxxxxxxxxx > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> > --- > Not even build tested, just focused on the content and to keep my promise I'd > send this out (better than never sending it) ;-). I appreciate maintainers of > perfbook taking this forward ;-). Thanks! I've just tested this... And it failed to build. I think I'll post a v2 which will build, with some wordsmithing I can think of. A few quick comments below. > > bib/hw.bib | 8 ++++++++ bib/memorymodel.bib looks like a suitable destination. > memorder/memorder.tex | 8 ++++++++ > 2 files changed, 16 insertions(+) > > diff --git a/bib/hw.bib b/bib/hw.bib > index b0885e74..b1dfd119 100644 > --- a/bib/hw.bib > +++ b/bib/hw.bib > @@ -1159,3 +1159,11 @@ Luis Stevens and Anoop Gupta and John Hennessy", > note="\url{https://github.com/google/fuzzing/blob/master/docs/silifuzz.pdf}", > } > > +@unpublished{Tsirkin2017, > + Author="Michael S. Tsirkin", > + Title="locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE", "_" in title needs an escape. > + month="November", > + day="10", > + year="2017", > + note="\url{https://lore.kernel.org/all/tip-450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730@xxxxxxxxxxxxxx/}", > +} > diff --git a/memorder/memorder.tex b/memorder/memorder.tex > index 5c978fbe..b28ac4f0 100644 > --- a/memorder/memorder.tex > +++ b/memorder/memorder.tex > @@ -6081,6 +6081,14 @@ A few older variants of the x86 CPU have a mode bit that enables out-of-order > stores, and for these CPUs, \co{smp_wmb()} must also be defined to > be \co{lock;addl}. > > +A 2017 kernel commit by Michael S. Tsirkin replaced \co{mfence} with > +\co{lock add} in \co{smp_mb()}, achieving a 60 percent performance > +boost~\cite{Tsirkin2017}. The change used a 4-byte negative offset from ^ perfbook's LaTeX source convention needs a line break at the end of a sentence. > +the \co{SP} to avoid slowness due to false data-dependencies, > +instead of directly modifying the \co{SP}. \co{clflush} users still > +need to use \co{mfence} for ordering, so they have been converted to use > +\co{mb} instead of \co{smp_mb}, which uses an \co{mfence} as before. > + > Although newer x86 implementations accommodate self-modifying code > without any special instructions, to be fully compatible with > past and potential future x86 implementations, a given CPU must Anyway, please wait for my v2. Thanks, Akira