>From 3b2c58c7e7f7abe4303383437502513f948d7401 Mon Sep 17 00:00:00 2001 From: Akira Yokosawa <akiyks@xxxxxxxxx> Date: Sat, 18 Jun 2016 10:38:57 +0900 Subject: [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables Numbers given in 'Comms Fabric' and 'Global Comms' rows in Table D.1 seem inconsistent. 'Comms Fabric' latency in Table 3.1 is 3 microsecond. The latency of Infiniband DDR, which was available in 2005 (at the time of AMD Opteron 844) is 2.5 microsecond. 'Comms Fabric' latency in Table D.1 is 4.5 microsecond. The latency of Infiniband QDR, which was available in 2009 (at the time of Intel X5550 (Nehalem)) is 1.3 microsecond. These latencies are of one-way communication. In the other rows in the tables, costs are for at least one round- trip. So we need to double these numbers for consistency. For 'Comms Fabric', we'd be better to use 5 microsecond in Table 3.1, and 2.6 microsecond in Table D.1. Of course, these numbers are for bast cases. Actual latency would depend on the topology and the configuration of fabric. 'Global Comms' latency in Table 3.1 is 130 ms. This is based on the speed-of-light in vacuum. On the other hand, 'Global Comms' latency in Table D.1 is 195 ms. This is based on the speed-of-light in optical fiber. The number in Table D.1 is more realistic and we should use it in both tables. This commit fixes these inconsistencies and modifies the related explanation in the text accordingly. Suggested-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> Signed-off-by: Akira Yokosawa <akiyks@xxxxxxxxx> --- cpu/overheads.tex | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/cpu/overheads.tex b/cpu/overheads.tex index 311c43e..bfdd711 100644 --- a/cpu/overheads.tex +++ b/cpu/overheads.tex @@ -126,12 +126,12 @@ This simplified sequence is just the beginning of a discipline called \hline CAS cache miss & 306.0 & 510.0 \\ \hline - Comms Fabric & 3,000\textcolor{white}{.0} - & 5,000\textcolor{white}{.0} + Comms Fabric & 5,000\textcolor{white}{.0} + & 8,330\textcolor{white}{.0} \\ \hline - Global Comms & 130,000,000\textcolor{white}{.0} - & 216,000,000\textcolor{white}{.0} + Global Comms & 195,000,000\textcolor{white}{.0} + & 325,000,000\textcolor{white}{.0} \\ \\ \end{tabular} \caption{Performance of Synchronization Mechanisms on 4-CPU 1.8GHz AMD Opteron 844 System} @@ -224,11 +224,11 @@ global agreement. \hline CAS cache miss & 95.9 & 266.4 \\ \hline - Comms Fabric & 4,500\textcolor{white}{.0} - & 7,500\textcolor{white}{.0} \\ + Comms Fabric & 2,600\textcolor{white}{.0} + & 7,220\textcolor{white}{.0} \\ \hline Global Comms & 195,000,000\textcolor{white}{.0} - & 324,000,000\textcolor{white}{.0} \\ + & 542,000,000\textcolor{white}{.0} \\ \end{tabular} \caption{Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} \label{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} @@ -264,15 +264,19 @@ I/O operations are even more expensive. As shown in the ``Comms Fabric'' row, high performance (and expensive!) communications fabric, such as InfiniBand or any number of proprietary interconnects, has a latency -of roughly three microseconds, during which time five \emph{thousand} -instructions might have been executed. +of roughly five microseconds for an end-to-end round trip, during which +time more than eight \emph{thousand} instructions might have been executed. Standards-based communications networks often require some sort of protocol processing, which further increases the latency. Of course, geographic distance also increases latency, with the -theoretical speed-of-light latency around the world coming to -roughly 130 \emph{milliseconds}, or more than 200 million clock +speed-of-light through optical fiber latency around the world coming to +roughly 195 \emph{milliseconds}, or more than 300 million clock cycles, as shown in the ``Global Comms'' row. +% Reference of Infiniband latency: +% http://www.hpcadvisorycouncil.com/events/2014/swiss-workshop/presos/Day_1/1_Mellanox.pdf +% page 6/76 'Leading Interconnect, Leading Performance' + \QuickQuiz{} These numbers are insanely large! How can I possibly get my head around them? -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe perfbook" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html