On Sat, Jun 18, 2016 at 11:50:53AM +0900, Akira Yokosawa wrote: > >From 3b2c58c7e7f7abe4303383437502513f948d7401 Mon Sep 17 00:00:00 2001 > From: Akira Yokosawa <akiyks@xxxxxxxxx> > Date: Sat, 18 Jun 2016 10:38:57 +0900 > Subject: [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables > > Numbers given in 'Comms Fabric' and 'Global Comms' rows in > Table D.1 seem inconsistent. > > 'Comms Fabric' latency in Table 3.1 is 3 microsecond. > The latency of Infiniband DDR, which was available in 2005 (at the > time of AMD Opteron 844) is 2.5 microsecond. > 'Comms Fabric' latency in Table D.1 is 4.5 microsecond. > The latency of Infiniband QDR, which was available in 2009 (at the > time of Intel X5550 (Nehalem)) is 1.3 microsecond. > These latencies are of one-way communication. > In the other rows in the tables, costs are for at least one round- > trip. So we need to double these numbers for consistency. > > For 'Comms Fabric', we'd be better to use 5 microsecond in Table 3.1, > and 2.6 microsecond in Table D.1. > > Of course, these numbers are for bast cases. Actual latency would > depend on the topology and the configuration of fabric. > > 'Global Comms' latency in Table 3.1 is 130 ms. > This is based on the speed-of-light in vacuum. > On the other hand, 'Global Comms' latency in Table D.1 is 195 ms. > This is based on the speed-of-light in optical fiber. > The number in Table D.1 is more realistic and we should use it > in both tables. > > This commit fixes these inconsistencies and modifies the related > explanation in the text accordingly. > > Suggested-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> > Signed-off-by: Akira Yokosawa <akiyks@xxxxxxxxx> Nice!!! Applied and pushed. Thanx, Paul > --- > cpu/overheads.tex | 26 +++++++++++++++----------- > 1 file changed, 15 insertions(+), 11 deletions(-) > > diff --git a/cpu/overheads.tex b/cpu/overheads.tex > index 311c43e..bfdd711 100644 > --- a/cpu/overheads.tex > +++ b/cpu/overheads.tex > @@ -126,12 +126,12 @@ This simplified sequence is just the beginning of a discipline called > \hline > CAS cache miss & 306.0 & 510.0 \\ > \hline > - Comms Fabric & 3,000\textcolor{white}{.0} > - & 5,000\textcolor{white}{.0} > + Comms Fabric & 5,000\textcolor{white}{.0} > + & 8,330\textcolor{white}{.0} > \\ > \hline > - Global Comms & 130,000,000\textcolor{white}{.0} > - & 216,000,000\textcolor{white}{.0} > + Global Comms & 195,000,000\textcolor{white}{.0} > + & 325,000,000\textcolor{white}{.0} \\ > \\ > \end{tabular} > \caption{Performance of Synchronization Mechanisms on 4-CPU 1.8GHz AMD Opteron 844 System} > @@ -224,11 +224,11 @@ global agreement. > \hline > CAS cache miss & 95.9 & 266.4 \\ > \hline > - Comms Fabric & 4,500\textcolor{white}{.0} > - & 7,500\textcolor{white}{.0} \\ > + Comms Fabric & 2,600\textcolor{white}{.0} > + & 7,220\textcolor{white}{.0} \\ > \hline > Global Comms & 195,000,000\textcolor{white}{.0} > - & 324,000,000\textcolor{white}{.0} \\ > + & 542,000,000\textcolor{white}{.0} \\ > \end{tabular} > \caption{Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} > \label{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} > @@ -264,15 +264,19 @@ I/O operations are even more expensive. > As shown in the ``Comms Fabric'' row, > high performance (and expensive!) communications fabric, such as > InfiniBand or any number of proprietary interconnects, has a latency > -of roughly three microseconds, during which time five \emph{thousand} > -instructions might have been executed. > +of roughly five microseconds for an end-to-end round trip, during which > +time more than eight \emph{thousand} instructions might have been executed. > Standards-based communications networks often require some sort of > protocol processing, which further increases the latency. > Of course, geographic distance also increases latency, with the > -theoretical speed-of-light latency around the world coming to > -roughly 130 \emph{milliseconds}, or more than 200 million clock > +speed-of-light through optical fiber latency around the world coming to > +roughly 195 \emph{milliseconds}, or more than 300 million clock > cycles, as shown in the ``Global Comms'' row. > > +% Reference of Infiniband latency: > +% http://www.hpcadvisorycouncil.com/events/2014/swiss-workshop/presos/Day_1/1_Mellanox.pdf > +% page 6/76 'Leading Interconnect, Leading Performance' > + > \QuickQuiz{} > These numbers are insanely large! > How can I possibly get my head around them? > -- > 1.9.1 > > -- To unsubscribe from this list: send the line "unsubscribe perfbook" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html