Re: Two identical systems, radically different performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/15/2012 05:34 PM, Scott Marlowe wrote:
On Mon, Oct 15, 2012 at 9:28 AM, Claudio Freire <klaussfreire@xxxxxxxxx> wrote:
On Mon, Oct 15, 2012 at 12:24 PM, Andrea Suisani <sickpig@xxxxxxxxxxxx> wrote:
sure you're right.

It's just that my bet was on a higher throughput
when HT was isabled from the BIOS (as you stated
previously in this thread).

Yes, mine too. It's bizarre. If I were you, I'd look into it more
deeply. It may be a flaw in your test methodology (maybe you disabled
the wrong cores?). If not, it would be good to know why the extra TPS
to replicate elsewhere.

I'd recommend more synthetic benchmarks when trying to compare systems
like this.  bonnie++,

you were right. bonnie++ (-f -n 0 -c 4) show that there's very little (if any)
difference in terms of sequential input whether or not cache is enabled on the
RAID1 (SAS 15K, sdb).

I've run 2 bonnie++ test with both cache enabled and disabled and what I get
(see attachments for more details) it's a 400MB/s sequential input (cache) vs
390MBs (nocache).

I dunno why but I would have expected a higher delta (due to the 512MB cache)
not a mere 10MB/s, but this is only based on my gut feeling.

I've also tried to test RAID1 array where the OS is installed (2 SATA 7.2Krpm, sda)
just to verify if cache effect is comparable with the one I get from SAS disks.

Well it seems that there's no cache effects or if it's is there is so small as to be
confused with the noise.

Both array are configured with this params

Read Policy               : Adaptive Read Ahead
Write Policy              : Write Back
Stripe Element Size       : 64 KB
Disk Cache Policy         : Disabled

those tests are performed with HT disable from the BIOS, but without
using noht kernel boot param. the scheduler for sdb was setted to deadline
while the default cfq for sda.

 the memory stream test that Greg Smith was
working on, and so on.

this one https://github.com/gregs1104/stream-scaling, right?

I've executed the test with HT enabled, HT disabled from the BIOS
and HT disable using sys interface. Attached 3 graphs and related
text files


Get an idea what core differences the machines
display under such testing.

I'm trying... hard :)

Andrea


Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   4     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
cloud        32000M           180829  23 112581  15           401611  19 917.7  10
Latency                         445ms     954ms               360ms   64788us

1.96,1.96,cloud,4,1350463530,32000M,,,,180829,23,112581,15,,,401611,19,917.7,10,,,,,,,,,,,,,,,,,,,445ms,954ms,,360ms,64788us,,,,,,
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   4     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
cloud        32000M           179259  23 112093  15           400021  18 600.9  26
Latency                         670ms     514ms             99640us   86081us

1.96,1.96,cloud,4,1350465025,32000M,,,,179259,23,112093,15,,,400021,18,600.9,26,,,,,,,,,,,,,,,,,,,670ms,514ms,,99640us,86081us,,,,,,
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   4     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
cloud        32000M           174212  22 108909  15           387598  18 928.9  11
Latency                         960ms    1397ms               967ms   71495us

1.96,1.96,cloud,4,1350465002,32000M,,,,174212,22,108909,15,,,387598,18,928.9,11,,,,,,,,,,,,,,,,,,,960ms,1397ms,,967ms,71495us,,,,,,
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   4     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
cloud        32000M           175185  22 109229  15           391409  18 912.9  10
Latency                        1002ms    1225ms              1059ms   91373us

1.96,1.96,cloud,4,1350466614,32000M,,,,175185,22,109229,15,,,391409,18,912.9,10,,,,,,,,,,,,,,,,,,,1002ms,1225ms,,1059ms,91373us,,,,,,

Attachment: stream-ht_disabled_bios.png
Description: PNG image

=== CPU cache information ===                                                                                                                                                                                                         [288/809]
CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu0 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu0 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu1 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu1 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu2 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu2 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu2 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu2 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu3 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu3 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu3 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu3 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu4 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu4 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu4 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu4 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu5 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu5 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu5 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu5 Level 3 Cache: 15360K (Unified)
Total CPU system cache: 96141312 bytes
Suggested minimum array elements needed: 43700596
Array elements used: 43700596

=== CPU Core Summary ===
processor       : 5
model name      : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
cpu MHz         : 1999.842
siblings        : 6

=== Check and build stream ===

=== Testing up to 6 cores ===

-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 43700596, Offset = 0
Total memory required = 1000.2 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 1
-------------------------------------------------------------
Printing one line per active thread....
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 42747 microseconds.
   (= 42747 clock ticks)
Increase the size of the arrays if this shows that
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       11189.6576       0.0625       0.0625       0.0626
Scale:      11226.3233       0.0623       0.0623       0.0624
Add:        12419.6669       0.0845       0.0844       0.0847
Triad:      12282.1773       0.0855       0.0854       0.0856
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------

Number of Threads requested = 2
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      21066.0085       0.0500       0.0498       0.0502

Number of Threads requested = 3
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      23206.8603       0.0453       0.0452       0.0455

Number of Threads requested = 4
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      23555.5498       0.0446       0.0445       0.0446

Number of Threads requested = 5
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      23424.7239       0.0448       0.0448       0.0449

Number of Threads requested = 6
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      23298.1809       0.0451       0.0450       0.0452

Attachment: stream-ht_disabled_sysfs.png
Description: PNG image

-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 43700596, Offset = 0
Total memory required = 1000.2 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 6
-------------------------------------------------------------
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 26278 microseconds.
   (= 26278 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       20539.5415       0.0347       0.0340       0.0392
Scale:      20521.5758       0.0347       0.0341       0.0390
Add:        23627.7925       0.0455       0.0444       0.0526
Triad:      23673.6951       0.0450       0.0443       0.0499
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------

Attachment: stream-ht_enabled.png
Description: PNG image

=== CPU cache information ===
CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu0 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu0 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu1 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu1 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu10 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu10 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu10 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu10 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu11 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu11 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu11 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu11 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu2 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu2 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu2 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu2 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu3 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu3 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu3 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu3 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu4 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu4 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu4 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu4 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu5 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu5 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu5 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu5 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu6 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu6 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu6 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu6 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu7 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu7 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu7 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu7 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu8 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu8 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu8 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu8 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu9 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu9 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu9 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu9 Level 3 Cache: 15360K (Unified)
Total CPU system cache: 192282624 bytes
Suggested minimum array elements needed: 87401192
Array elements used: 87401192

=== CPU Core Summary ===
processor	: 11
model name	: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
cpu MHz		: 1999.946
siblings	: 12

=== Check and build stream ===

=== Testing up to 12 cores ===

-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 87401192, Offset = 0
Total memory required = 2000.5 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 1
-------------------------------------------------------------
Printing one line per active thread....
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 87534 microseconds.
   (= 87534 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       10611.6239       0.1319       0.1318       0.1320
Scale:      10975.4789       0.1275       0.1274       0.1277
Add:        11859.7763       0.1769       0.1769       0.1771
Triad:      12273.3535       0.1710       0.1709       0.1712
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------

Number of Threads requested = 2
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      21378.2018       0.0983       0.0981       0.0985

Number of Threads requested = 3
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      24124.2561       0.0875       0.0870       0.0883

Number of Threads requested = 4
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      24631.0616       0.0855       0.0852       0.0858

Number of Threads requested = 5
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      24246.1186       0.0868       0.0865       0.0872

Number of Threads requested = 6
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      23878.2058       0.0880       0.0878       0.0884

Number of Threads requested = 7
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      22625.2297       0.0952       0.0927       0.0990

Number of Threads requested = 8
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      23126.5826       0.0982       0.0907       0.1056

Number of Threads requested = 9
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      23425.1605       0.0950       0.0895       0.1040

Number of Threads requested = 10
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      22919.8752       0.0937       0.0915       0.0954

Number of Threads requested = 11
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      23267.4353       0.0947       0.0902       0.1027

Number of Threads requested = 12
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      23229.3473       0.0905       0.0903       0.0907

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux