On 10/15/2012 05:34 PM, Scott Marlowe wrote:
On Mon, Oct 15, 2012 at 9:28 AM, Claudio Freire <klaussfreire@xxxxxxxxx> wrote:On Mon, Oct 15, 2012 at 12:24 PM, Andrea Suisani <sickpig@xxxxxxxxxxxx> wrote:sure you're right. It's just that my bet was on a higher throughput when HT was isabled from the BIOS (as you stated previously in this thread).Yes, mine too. It's bizarre. If I were you, I'd look into it more deeply. It may be a flaw in your test methodology (maybe you disabled the wrong cores?). If not, it would be good to know why the extra TPS to replicate elsewhere.I'd recommend more synthetic benchmarks when trying to compare systems like this. bonnie++,
you were right. bonnie++ (-f -n 0 -c 4) show that there's very little (if any) difference in terms of sequential input whether or not cache is enabled on the RAID1 (SAS 15K, sdb). I've run 2 bonnie++ test with both cache enabled and disabled and what I get (see attachments for more details) it's a 400MB/s sequential input (cache) vs 390MBs (nocache). I dunno why but I would have expected a higher delta (due to the 512MB cache) not a mere 10MB/s, but this is only based on my gut feeling. I've also tried to test RAID1 array where the OS is installed (2 SATA 7.2Krpm, sda) just to verify if cache effect is comparable with the one I get from SAS disks. Well it seems that there's no cache effects or if it's is there is so small as to be confused with the noise. Both array are configured with this params Read Policy : Adaptive Read Ahead Write Policy : Write Back Stripe Element Size : 64 KB Disk Cache Policy : Disabled those tests are performed with HT disable from the BIOS, but without using noht kernel boot param. the scheduler for sdb was setted to deadline while the default cfq for sda.
the memory stream test that Greg Smith was working on, and so on.
this one https://github.com/gregs1104/stream-scaling, right? I've executed the test with HT enabled, HT disabled from the BIOS and HT disable using sys interface. Attached 3 graphs and related text files
Get an idea what core differences the machines display under such testing.
I'm trying... hard :) Andrea
Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 4 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP cloud 32000M 180829 23 112581 15 401611 19 917.7 10 Latency 445ms 954ms 360ms 64788us 1.96,1.96,cloud,4,1350463530,32000M,,,,180829,23,112581,15,,,401611,19,917.7,10,,,,,,,,,,,,,,,,,,,445ms,954ms,,360ms,64788us,,,,,,
Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 4 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP cloud 32000M 179259 23 112093 15 400021 18 600.9 26 Latency 670ms 514ms 99640us 86081us 1.96,1.96,cloud,4,1350465025,32000M,,,,179259,23,112093,15,,,400021,18,600.9,26,,,,,,,,,,,,,,,,,,,670ms,514ms,,99640us,86081us,,,,,,
Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 4 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP cloud 32000M 174212 22 108909 15 387598 18 928.9 11 Latency 960ms 1397ms 967ms 71495us 1.96,1.96,cloud,4,1350465002,32000M,,,,174212,22,108909,15,,,387598,18,928.9,11,,,,,,,,,,,,,,,,,,,960ms,1397ms,,967ms,71495us,,,,,,
Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 4 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP cloud 32000M 175185 22 109229 15 391409 18 912.9 10 Latency 1002ms 1225ms 1059ms 91373us 1.96,1.96,cloud,4,1350466614,32000M,,,,175185,22,109229,15,,,391409,18,912.9,10,,,,,,,,,,,,,,,,,,,1002ms,1225ms,,1059ms,91373us,,,,,,
Attachment:
stream-ht_disabled_bios.png
Description: PNG image
=== CPU cache information === [288/809] CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu0 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu0 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu1 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu1 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu2 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu2 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu2 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu2 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu3 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu3 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu3 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu3 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu4 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu4 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu4 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu4 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu5 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu5 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu5 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu5 Level 3 Cache: 15360K (Unified) Total CPU system cache: 96141312 bytes Suggested minimum array elements needed: 43700596 Array elements used: 43700596 === CPU Core Summary === processor : 5 model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz cpu MHz : 1999.842 siblings : 6 === Check and build stream === === Testing up to 6 cores === ------------------------------------------------------------- STREAM version $Revision: 5.9 $ ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 43700596, Offset = 0 Total memory required = 1000.2 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested = 1 ------------------------------------------------------------- Printing one line per active thread.... ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 42747 microseconds. (= 42747 clock ticks) Increase the size of the arrays if this shows that Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 11189.6576 0.0625 0.0625 0.0626 Scale: 11226.3233 0.0623 0.0623 0.0624 Add: 12419.6669 0.0845 0.0844 0.0847 Triad: 12282.1773 0.0855 0.0854 0.0856 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- Number of Threads requested = 2 Function Rate (MB/s) Avg time Min time Max time Triad: 21066.0085 0.0500 0.0498 0.0502 Number of Threads requested = 3 Function Rate (MB/s) Avg time Min time Max time Triad: 23206.8603 0.0453 0.0452 0.0455 Number of Threads requested = 4 Function Rate (MB/s) Avg time Min time Max time Triad: 23555.5498 0.0446 0.0445 0.0446 Number of Threads requested = 5 Function Rate (MB/s) Avg time Min time Max time Triad: 23424.7239 0.0448 0.0448 0.0449 Number of Threads requested = 6 Function Rate (MB/s) Avg time Min time Max time Triad: 23298.1809 0.0451 0.0450 0.0452
Attachment:
stream-ht_disabled_sysfs.png
Description: PNG image
------------------------------------------------------------- STREAM version $Revision: 5.9 $ ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 43700596, Offset = 0 Total memory required = 1000.2 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested = 6 ------------------------------------------------------------- Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 26278 microseconds. (= 26278 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 20539.5415 0.0347 0.0340 0.0392 Scale: 20521.5758 0.0347 0.0341 0.0390 Add: 23627.7925 0.0455 0.0444 0.0526 Triad: 23673.6951 0.0450 0.0443 0.0499 ------------------------------------------------------------- Solution Validates -------------------------------------------------------------
Attachment:
stream-ht_enabled.png
Description: PNG image
=== CPU cache information === CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu0 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu0 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu1 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu1 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu10 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu10 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu10 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu10 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu11 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu11 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu11 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu11 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu2 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu2 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu2 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu2 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu3 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu3 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu3 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu3 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu4 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu4 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu4 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu4 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu5 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu5 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu5 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu5 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu6 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu6 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu6 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu6 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu7 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu7 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu7 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu7 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu8 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu8 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu8 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu8 Level 3 Cache: 15360K (Unified) CPU /sys/devices/system/cpu/cpu9 Level 1 Cache: 32K (Data) CPU /sys/devices/system/cpu/cpu9 Level 1 Cache: 32K (Instruction) CPU /sys/devices/system/cpu/cpu9 Level 2 Cache: 256K (Unified) CPU /sys/devices/system/cpu/cpu9 Level 3 Cache: 15360K (Unified) Total CPU system cache: 192282624 bytes Suggested minimum array elements needed: 87401192 Array elements used: 87401192 === CPU Core Summary === processor : 11 model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz cpu MHz : 1999.946 siblings : 12 === Check and build stream === === Testing up to 12 cores === ------------------------------------------------------------- STREAM version $Revision: 5.9 $ ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 87401192, Offset = 0 Total memory required = 2000.5 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested = 1 ------------------------------------------------------------- Printing one line per active thread.... ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 87534 microseconds. (= 87534 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 10611.6239 0.1319 0.1318 0.1320 Scale: 10975.4789 0.1275 0.1274 0.1277 Add: 11859.7763 0.1769 0.1769 0.1771 Triad: 12273.3535 0.1710 0.1709 0.1712 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- Number of Threads requested = 2 Function Rate (MB/s) Avg time Min time Max time Triad: 21378.2018 0.0983 0.0981 0.0985 Number of Threads requested = 3 Function Rate (MB/s) Avg time Min time Max time Triad: 24124.2561 0.0875 0.0870 0.0883 Number of Threads requested = 4 Function Rate (MB/s) Avg time Min time Max time Triad: 24631.0616 0.0855 0.0852 0.0858 Number of Threads requested = 5 Function Rate (MB/s) Avg time Min time Max time Triad: 24246.1186 0.0868 0.0865 0.0872 Number of Threads requested = 6 Function Rate (MB/s) Avg time Min time Max time Triad: 23878.2058 0.0880 0.0878 0.0884 Number of Threads requested = 7 Function Rate (MB/s) Avg time Min time Max time Triad: 22625.2297 0.0952 0.0927 0.0990 Number of Threads requested = 8 Function Rate (MB/s) Avg time Min time Max time Triad: 23126.5826 0.0982 0.0907 0.1056 Number of Threads requested = 9 Function Rate (MB/s) Avg time Min time Max time Triad: 23425.1605 0.0950 0.0895 0.1040 Number of Threads requested = 10 Function Rate (MB/s) Avg time Min time Max time Triad: 22919.8752 0.0937 0.0915 0.0954 Number of Threads requested = 11 Function Rate (MB/s) Avg time Min time Max time Triad: 23267.4353 0.0947 0.0902 0.1027 Number of Threads requested = 12 Function Rate (MB/s) Avg time Min time Max time Triad: 23229.3473 0.0905 0.0903 0.0907
-- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance