Re: [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 18, 2016 at 11:50:53AM +0900, Akira Yokosawa wrote:
> >From 3b2c58c7e7f7abe4303383437502513f948d7401 Mon Sep 17 00:00:00 2001
> From: Akira Yokosawa <akiyks@xxxxxxxxx>
> Date: Sat, 18 Jun 2016 10:38:57 +0900
> Subject: [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables
> 
> Numbers given in 'Comms Fabric' and 'Global Comms' rows in
> Table D.1 seem inconsistent.
> 
> 'Comms Fabric' latency in Table 3.1 is 3 microsecond.
> The latency of Infiniband DDR, which was available in 2005 (at the
> time of AMD Opteron 844) is 2.5 microsecond.
> 'Comms Fabric' latency in Table D.1 is 4.5 microsecond.
> The latency of Infiniband QDR, which was available in 2009 (at the
> time of Intel X5550 (Nehalem)) is 1.3 microsecond.
> These latencies are of one-way communication.
> In the other rows in the tables, costs are for at least one round-
> trip. So we need to double these numbers for consistency.
> 
> For 'Comms Fabric', we'd be better to use 5 microsecond in Table 3.1,
> and 2.6 microsecond in Table D.1.
> 
> Of course, these numbers are for bast cases. Actual latency would
> depend on the topology and the configuration of fabric.
> 
> 'Global Comms' latency in Table 3.1 is 130 ms.
> This is based on the speed-of-light in vacuum.
> On the other hand, 'Global Comms' latency in Table D.1 is 195 ms.
> This is based on the speed-of-light in optical fiber.
> The number in Table D.1 is more realistic and we should use it
> in both tables.
> 
> This commit fixes these inconsistencies and modifies the related
> explanation in the text accordingly.
> 
> Suggested-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: Akira Yokosawa <akiyks@xxxxxxxxx>

Nice!!! Applied and pushed.

							Thanx, Paul

> ---
>  cpu/overheads.tex | 26 +++++++++++++++-----------
>  1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/cpu/overheads.tex b/cpu/overheads.tex
> index 311c43e..bfdd711 100644
> --- a/cpu/overheads.tex
> +++ b/cpu/overheads.tex
> @@ -126,12 +126,12 @@ This simplified sequence is just the beginning of a discipline called
>  	\hline
>  	CAS cache miss		&         306.0	&         510.0 \\
>  	\hline
> -	Comms Fabric		&       3,000\textcolor{white}{.0}
> -						&       5,000\textcolor{white}{.0}
> +	Comms Fabric		&       5,000\textcolor{white}{.0}
> +						&       8,330\textcolor{white}{.0}
>  								\\
>  	\hline
> -	Global Comms		& 130,000,000\textcolor{white}{.0}
> -						& 216,000,000\textcolor{white}{.0}
> +	Global Comms		& 195,000,000\textcolor{white}{.0}
> +						& 325,000,000\textcolor{white}{.0} \\
>  								\\
>  \end{tabular}
>  \caption{Performance of Synchronization Mechanisms on 4-CPU 1.8GHz AMD Opteron 844 System}
> @@ -224,11 +224,11 @@ global agreement.
>  	\hline
>  	CAS cache miss		&          95.9	&         266.4 \\
>  	\hline
> -	Comms Fabric		&       4,500\textcolor{white}{.0}
> -						&	7,500\textcolor{white}{.0} \\
> +	Comms Fabric		&       2,600\textcolor{white}{.0}
> +						&	7,220\textcolor{white}{.0} \\
>  	\hline
>  	Global Comms		& 195,000,000\textcolor{white}{.0}
> -						& 324,000,000\textcolor{white}{.0} \\
> +						& 542,000,000\textcolor{white}{.0} \\
>  \end{tabular}
>  \caption{Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
>  \label{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
> @@ -264,15 +264,19 @@ I/O operations are even more expensive.
>  As shown in the ``Comms Fabric'' row,
>  high performance (and expensive!) communications fabric, such as
>  InfiniBand or any number of proprietary interconnects, has a latency
> -of roughly three microseconds, during which time five \emph{thousand}
> -instructions might have been executed.
> +of roughly five microseconds for an end-to-end round trip, during which
> +time more than eight \emph{thousand} instructions might have been executed.
>  Standards-based communications networks often require some sort of
>  protocol processing, which further increases the latency.
>  Of course, geographic distance also increases latency, with the
> -theoretical speed-of-light latency around the world coming to
> -roughly 130 \emph{milliseconds}, or more than 200 million clock
> +speed-of-light through optical fiber latency around the world coming to
> +roughly 195 \emph{milliseconds}, or more than 300 million clock
>  cycles, as shown in the ``Global Comms'' row.
> 
> +% Reference of Infiniband latency:
> +% http://www.hpcadvisorycouncil.com/events/2014/swiss-workshop/presos/Day_1/1_Mellanox.pdf
> +%     page 6/76 'Leading Interconnect, Leading Performance'
> +
>  \QuickQuiz{}
>  	These numbers are insanely large!
>  	How can I possibly get my head around them?
> -- 
> 1.9.1
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe perfbook" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux