Forgive me if this has been beaten into the ground, but my
team and I couldn’t find much conclusive study or posts on this
issue. To make a long story short: we’re experiencing Xeons as 50%
slower than Opterons, even when the Xeon has twice as much cache and a slight
clock speed advantage. The full story: we have an older production server with 2G
of RAM, 2.4GHz Opterons w/ 1M of cache. The database is not large, only
around 7M or 8M rows altogether, 2.5G on disk. Most queries are reads,
probably on a 10:1 proportion with writes. In the process of upgrading this
server to a pair of DRBD-mirrored (more on this below) servers we discovered
that the new servers were actually slower than the older one. The newer
servers have 4G of RAM, 3.0GHz Xeons with 2M of cache. And not just a
little slower, but queries (simple, complex, and disgusting recursive stored
procedures) routinely run in 50-100% more time than they did on the older
server. After many troubleshooting techniques (downgrading the kernel to that
of the older machine, verifying version parity, copying the binary from the
older server, building a 32bit binary on the new servers, running the entire
database out of a ramdisk, and of course much tweaking of postgresql.conf) and
seeing virtually no benefit from any of these tests I finally took the final
leap: just pull the disks and throw them in a newer Opteron chassis (2.8GHz, 1M
cache). And whaddya know? It’s got a 20% speed edge on the
older Opteron, and blows away the performance of the newer Xeons. One of my guys did some testing and it appears that
LWLockAquire and LWLockRelease are the culprits, but we’re not entirely
confident of our conclusion. Any thoughts on why this might be so
different between the two architectures? We’re a hosting provider
so we’ve got some spare equipment to work with and I’m going to
request that we keep these two boxes up for a week or so. Are there any
other tests that you guys can suggest that would help get down to the bottom of
this? I figure that not everyone has access to as much gear as we do so
it might be a good opportunity to get some A/B testing on a production database
on identical OS/server installs on different hardware. I’m content
to just say “Well, we use Opterons then!”, but I imagine that if we
could help bring equal performance to Xeon users that it would be worth the effort
of volunteering. To be clear, I have two machines sitting on the network
ready for tweaking, one is a Xeon, the other is an Opteron, neither is in
production and both can be fully mangled in the interest of figuring this out. Speaking of being a hosting provider, I may as well take a
moment to point out that we are working with DRBD for mirroring and have found
it works beautifully with PG (MySQL as well). Also, while our “Managed
Database Service” product is geared around MySQL, Oracle, and MSSQL, we’re
pretty familiar with PG and would be happy to talk to anyone about hosting
needs they may have. Thanks for listening, and again please let me know if there
is further testing we can do to help get to the bottom of this Opteron/Xeon
performance discrepancy. Bart Grantham VP of R&D Logicworks, Inc. www.logicworks.net |