Re: help tuning queries on large database

Ron <rjpeace@xxxxxxxxxxxxx> · Sun, 08 Jan 2006 16:35:11 -0500

I'll second all of Luke Lonergan's comments and add these.

You should be able to increase both "cold" and "warm" performance (as 
well as data integrity.  read below.) considerably.
Ron

At 05:59 PM 1/6/2006, peter royal wrote:
Howdy.

I'm running into scaling problems when testing with a 16gb (data 
+indexes) database.

I can run a query, and it returns in a few seconds. If I run it
again, it returns in a few milliseconds. I realize this is because
during subsequent runs, the necessary disk pages have been cached by
the OS.

I have experimented with having all 8 disks in a single RAID0 set, a
single RAID10 set, and currently 4 RAID0 sets of 2 disks each. There
hasn't been an appreciable difference in the overall performance of
my test suite (which randomly generates queries like the samples
below as well as a few other types. this problem manifests itself on
other queries in the test suite as well).

So, my question is, is there anything I can do to boost performance
with what I've got, or am I in a position where the only 'fix' is
more faster disks? I can't think of any schema/index changes that
would help, since everything looks pretty optimal from the 'explain
analyze' output. I'd like to get a 10x improvement when querying from
the 'cold' state.

Thanks for any assistance. The advice from reading this list to
getting to where I am now has been invaluable.
-peter

Configuration:

PostgreSQL 8.1.1

shared_buffers = 10000  # (It was higher, 50k, but didn't help any,
so brought down to free ram for disk cache)
work_mem = 8196
random_page_cost = 3
effective_cache_size = 250000

Hardware:

CentOS 4.2 (Linux 2.6.9-22.0.1.ELsmp)

Upgrade your kernel to at least  2.6.12
There's a known issue with earlier versions of the 2.6.x kernel and 
64b CPUs like the Opteron.  See kernel.org for details.

Areca ARC-1220 8-port PCI-E controller

Make sure you have 1GB or 2GB of cache.  Get the battery backup and 
set the cache for write back rather than write through.

8 x Hitachi Deskstar 7K80 (SATA2) (7200rpm)
2 x Opteron 242 @ 1.6ghz
3gb RAM (should be 4gb, but separate Linux issue preventing us from
getting it to see all of it)
Tyan Thunder K8WE
The K8WE has 8 DIMM slots.  That should be good for 16 or 32 GB of 
RAM (Depending on whether the mainboard recognizes 4GB DIMMs or 
not.  Ask Tyan about the latest K8WE firmare.).  If nothing else, 1GB 
DIMMs are now so cheap that you should have no problems having 8GB on the K8WE.

A 2.6.12 or later based Linux distro should have NO problems using 
more than 4GB or RAM.

Among the other tricks having lots of RAM allows:
If some of your tables are Read Only or VERY rarely written to, you 
can preload them at boot time and make them RAM resident using the 
/etc/tmpfs trick.

In addition there is at least one company making a cheap battery 
backed PCI-X card that can hold up to 4GB of RAM and pretend to be a 
small HD to the OS.  I don't remember any names at the moment, but 
there have been posts here and at storage.review.com on such products.

RAID Layout:

4 2-disk RAID0 sets created
You do know that a RAID 0 set provides _worse_ data protection than a 
single HD?  Don't use RAID 0 for any data you want kept reliably.

With 8 HDs, the best config is probably
1 2HD RAID 1 + 1 6HD RAID 10  or
2 4HD RAID 10's

It is certainly true that once you have done everything you can with 
RAM, the next set of HW optimizations is to add HDs.  The more the 
better up to a the limits of your available PCI-X bandwidth.

In short, a 2nd RAID fully populated controller is not unreasonable.