Antonym of "off-core" should be "on-core" rather than "in-core". Consistently use "on-core" in the overheads section. Similarly, say "on-socket" rather than "in-socket". Also for consistency, replace "single-CPU CAS" with "same-CPU CAS". Also, QQz added in commit 34cc066b1d95 ("cpu: Add a QQz on table E.1") uppercased some of related words in running text. Lowercase them for consistency. Signed-off-by: Akira Yokosawa <akiyks@xxxxxxxxx> --- cpu/overheads.tex | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/cpu/overheads.tex b/cpu/overheads.tex index 7ae99ed6cb7b..af17b3cfdf2f 100644 --- a/cpu/overheads.tex +++ b/cpu/overheads.tex @@ -159,7 +159,7 @@ optimization. & CAS & 7.0 & 14.6 & \\ & lock & 15.4 & 32.3 & \\ \midrule - \multicolumn{2}{l}{In-Core} + \multicolumn{2}{l}{On-Core} & & & 224 \\ & Blind CAS& 7.2 & 15.2 & \\ & CAS & 18.0 & 37.7 & \\ @@ -223,7 +223,7 @@ The lock operation is more expensive than CAS because it requires two atomic operations on the lock data structure, one for acquisition and the other for release. -In-core operations involving interactions between the hardware threads +On-core operations involving interactions between the hardware threads sharing a single core are about the same cost as same-CPU operations. This should not be too surprising, given that these two hardware threads also share the full cache hierarchy. @@ -253,10 +253,10 @@ failing. The key point is that there are now two accesses to the memory location, the load and the CAS\@. -Thus, it is not surprising that in-core blind CAS consumes only about -seven nanoseconds, while in-core CAS consumes about 18 nanoseconds. +Thus, it is not surprising that on-core blind CAS consumes only about +seven nanoseconds, while on-core CAS consumes about 18 nanoseconds. The non-blind case's extra load does not come for free. -That said, the overhead of these operations are similar to single-CPU +That said, the overhead of these operations are similar to same-CPU CAS and lock, respectively. \QuickQuiz{ @@ -351,7 +351,7 @@ thousand clock cycles. & CAS & 12.2 & 33.8 \\ & lock & 25.6 & 71.2 \\ \midrule - \multicolumn{2}{l}{In-Core} + \multicolumn{2}{l}{On-Core} & & \\ & Blind CAS & 12.9 & 35.8 \\ & CAS & 7.0 & 19.4 \\ @@ -393,7 +393,7 @@ thousand clock cycles. which represents a much smaller system with only 16~hardware threads. A similar view is provided by the rows of \cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz} - down to and including the two ``Off-core'' rows. + down to and including the two ``Off-Core'' rows. \begin{table} %\rowcolors{1}{}{lightgray} @@ -420,7 +420,7 @@ thousand clock cycles. & CAS & 6.2 & 13.6 & \\ & lock & 13.5 & 29.6 & \\ \midrule - \multicolumn{2}{l}{In-Core} & & & 6 \\ + \multicolumn{2}{l}{On-Core} & & & 6 \\ & Blind CAS & 6.5 & 14.3 & \\ & CAS & 16.2 & 35.6 & \\ \midrule @@ -470,7 +470,7 @@ thousand clock cycles. \QuickQuizE{ \Cref{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} in the answer to \QuickQuizARef{\QspeedOfLightAtoms} says that - In-Core CAS is faster than both of Same-CPU CAS and In-Core Blind CAS\@. + on-core CAS is faster than both of same-CPU CAS and on-core blind CAS\@. What is happening there? }\QuickQuizAnswerE{ I \emph{was} surprised by the data I obtained and did a rigorous @@ -508,7 +508,7 @@ First, there are only two CPUs within a given core and only 56 within a given socket, compared to 448 across the system. Second, as shown in \cref{tab:cpu:Cache Geometry for 8-Socket System With Intel Xeon Platinum 8176 CPUs @ 2.10GHz}, -the in-core caches are quite small compared to the in-socket caches, which +the on-core caches are quite small compared to the on-socket caches, which are in turn quite small compared to the 1.4\,TB of memory configured on this system. Third, again referring to the figure, the caches are organized as -- 2.25.1