On Tue, Apr 24, 2012 at 1:32 PM, Shaun Thomas <sthomas@xxxxxxxxx> wrote:
The processors are Intel(R) Xeon(R) CPU X5650 @ 2.67GHz.
Is there a way to tell what the BIOS power-down settings are for the cores from the CLI?
After reading Richard Foote's articles that Robert Klemme referenced in the previous post, I'm convinced.
Agreed.
Alas, a 3.x Linux kernel would be nice but I'm stuck with CentOS 6.2 on 2.6.32. I would very much appreciate any guidelines you can provide.
Yes; that does seem important. I found this:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/xfs.txt;hb=HEAD
which and while I was planning to set 'noatime', I'm a bit stumped on most of the rest. Anyone with comparable hardware willing to share their settings as a starting point for my testing?
I'm doing my best but the numbers will tell the story. :-)
Thanks for your review and feedback, Shaun.
Cheers,
Jan
On 04/23/2012 09:56 PM, Jan Nielsen wrote:This is a pretty good build. Nice and middle-of-the-road for current hardware. I think it's probably relevant what your "24 cores across 2 sockets" are, though. Then again, based on the 24-cores, I have to assume you've got hex-core Xeons of some sort, with hyperthreading. That suggests a higher end Sandy Bridge Xeon, like the X5645 or higher. If that's the case, you're in good hands.
The new hardware for the 50GB PG 9.0 machine is:
* 24 cores across 2 sockets
* 64 GB RAM
* 10 x 15k SAS drives on SAN
* 1 x 15k SAS drive local
* CentOS 6.2 (2.6.32 kernel)
The processors are Intel(R) Xeon(R) CPU X5650 @ 2.67GHz.
As a note, though... make sure you enable Turbo and other performance settings (disable power-down of unused CPUs, etc) in the BIOS when setting this up. We found that the defaults for the CPUs did not allow processor scaling, and it was far too aggressive in cycling down cores, such that cycling them back up had a non-zero cost. We saw roughly a 20% improvement by forcing the CPUs into full online performance mode.
Is there a way to tell what the BIOS power-down settings are for the cores from the CLI?
We are considering the following drive allocations:Please don't do this. If you have the system you just described, give yourself an 8x RAID10, and the 2x RAID1. I've found that your indexes will generally be about 1/3 to 1/2 the total sixe of your database. So, not only does your data partition lose read spindles, but you've wasted 1/2 to 2/3s of your active drive space. This may not be a concern based on your data growth curves, but it could be.
* 4 x 15k SAS drives, XFS, RAID 10 on SAN for PG data
* 4 x 15k SAS drives, XFS, RAID 10 on SAN for PG indexes
* 2 x 15k SAS drives, XFS, RAID 1 on SAN for PG xlog
* 1 x 15k SAS drive, XFS, on local storage for OS
After reading Richard Foote's articles that Robert Klemme referenced in the previous post, I'm convinced.
In addition, add another OS drive and put it into a RAID-1. If you have server-class hardware, you'll want that extra drive. I'm frankly surprised you were even able to acquire a dual Xeon class server without a RAID-1 for OS data by default.
Agreed.
I'm not sure if you've done metrics or not, but XFS performance is highly dependent on your init and mount options. I can give you some guidelines there, but one of the major changes is that the Linux 3.X kernels have some impressive performance improvements you won't see using CentOS 6.2. Metadata in particular has undergone a massive upgrade that drastically enhances its parallel scalability on metadata modifications.
Alas, a 3.x Linux kernel would be nice but I'm stuck with CentOS 6.2 on 2.6.32. I would very much appreciate any guidelines you can provide.
If possible, you might consider the new Ubuntu 12.04 LTS that's coming out soon. It should have the newer XFS performance. If not, consider injecting a newer kernel to the CentOS 6.2 install. And again, testing is the only way to know for sure.
And test with pgbench, if possible. I used this to get our XFS init and mount options, along with other OS/kernel settings.
Yes; that does seem important. I found this:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/xfs.txt;hb=HEAD
which and while I was planning to set 'noatime', I'm a bit stumped on most of the rest. Anyone with comparable hardware willing to share their settings as a starting point for my testing?
You can have very different performance metrics from dd/bonnie than an actual use pattern from real DB usage. As a hint, before you run any of these tests, both write a '3' to /proc/sys/vm/drop_caches, and restart your PG instance. You want to test your drives, not your memory. :)Good. Though you might consider lowering dirty_background_ratio. At that setting, it won't even try to write out data until you have about 3GB of dirty pages. Even high-end disk controllers only have 1GB of local capacitor-backed cache. If you really do have a good SAN, it probably has more than that, but try to induce a high-turnover database test to see what happens during heavy IO. Like, a heavy long-running PG-bench should invoke several checkpoints and also flood the local write cache. When that happens, monitor /proc/meminfo. Like this:
kernel.shmall = 4,294,967,296 (commas added for clarity)
kernel.shmax = 68,719,476,736 (commas added for clarity)
kernel.sem = 250 32000 32 128
vm.swappiness = 0
dirty_ratio = 10
dirty_background_ratio = 5
grep -A1 Dirty /proc/meminfo
That will tell you how much of your memory is dirty, but the 'Writeback' entry is what you care about. If you see that as a non-zero value for more than one consecutive check, you've saturated your write bandwidth to the point performance will suffer. But the only way you can really know any of this is with testing. Some SANs scale incredibly well to large pool flushes, and others don't.
Also, make iostat your friend. Particularly with the -x option. During your testing, keep one of these running in the background for the devices on your SAN. Watch your %util column in particular. Graph it, if you can. You can almost build a complete performance profile for different workloads before you put a single byte of real data on this hardware.You sound like you've read up on this quite a bit. Greg's book is a very good thing to have and learn from. It'll cover all the basics about the postgresql.conf file. I don't see how I could add much to that, so just pay attention to what he says. :)
If there are "obviously correct" choices in PG configuration, this would
be tremendously helpful information to me. I'm planning on using pgbench
to test the configuration options.
I'm doing my best but the numbers will tell the story. :-)
Thanks for your review and feedback, Shaun.
Cheers,
Jan
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@xxxxxxxxx
______________________________________________
See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email