While everybody else was talking about a new software release or
something today, I was busy finally nailing down something elusive that
pops up on this list regularly. A few weeks ago we just had a thread
named "Performance on new 64bit server compared to my 32bit desktop"
discussing how memory speed and number of active cores at a time are
related. This is always interesting to PostgreSQL performance in
particular, because any one query can only execute on a core at a time.
If your workload tends toward small numbers of long queries, the ability
of your server to handle high memory bandwidth with lots of cores
doesn't matter as much as access a single core can marshall.
The main program used for determine peak memory bandwidth is STREAM,
available at http://www.cs.virginia.edu/stream/
The thing I never see anybody doing is running that with increasing core
counts and showing the performance scaling. Another annoyance is that
you have to be extremely careful to test with enough memory to exceed
the sum of all caching on the processors by a large amount, or your
results will be quite inflated.
I believe I have whipped both of these problems for Linux systems having
gcc 4.2 or later, and the code to test is now available at:
http://github.com/gregs1104/stream-scaling It adds all of the cache
sizes, increases that by a whole order of magnitude to compute the test
size to really minimize their impact, and chugs away more or less
automatically trying all the core counts.
The documentation includes an initial 6 systems I was able to get
samples for, and they show a lot of the common things I've noticed
before quite nicely. The upper limits of DDR2 systems even when you
have lots of banks, how amazingly fast speeds to a single core are with
recent Intel+DDR3/1600 systems, all stuff I've measured at a higher
level are really higlighted with this low-level test.
Given the changes to the test method and size computations since the
earlier tests posted in the past thread here, I'm afraid I can't include
any of those results in the table. Note that this includes the newer
48-core AMD server that Scott Marlowe posted results from earlier; the
one you see in my README.rst sample results is not it, that's an older
system with 8 sockets, less cores per processor, and slower RAM.
There's still some concern in my mind about whether the test size was
really big enough in the earlier sample Scott submitted to the list, and
he actually has to do real work on that server for the moment before he
can re-test. Will get that filled in eventually.
If any of you into this sort of thing would like to contribute a result,
I'd like to see the following (off-list please, I'll summarize on the
page later, and let me know if you want to be credited or anonymous for
the contribution):
-Full output from the stream-scaling run
-Output of "cat /proc/cpuinfo" on your server
-Total amount of RAM in the server (including the output from "free"
will suffice)
-RAM topology and speed, if you know it. I can guess that in some cases
if you don't know.
--
Greg Smith, 2ndQuadrant US greg@xxxxxxxxxxxxxxx Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance