Re: Volunteer to build a configuration tool

Heikki Linnakangas <heikki@xxxxxxxxxxxxxxxx> · Wed, 20 Jun 2007 17:59:38 +0100

Campbell, Lance wrote:
It is amazing how many times you can read something before it actually
sinks in.

There seems to be two possible approaches to optimizing PostgreSQL 8.2:

Right.

File caching approach:
This approach is based on the fact that the OS will cache the necessary
PostgreSQL files.  The key here is to set the size of
effective_cache_size value as high as you think the OS has memory to
cache the files.  This approach would need the value of shared_buffers
to be relatively low.  Otherwise you are in a cense storing the data
twice.  One would also have to make sure that work_mem is not too high.
Since the files would be cached by the OS, work_mem could be relatively
low.  This is an ideal approach if you have a dedicated server since
there would be no other software using memory or accessing files that
the OS would try to cache.

There's no particular danger in setting work_mem too high in this 
approach. In fact, it's more important avoid a too large worm_mem 
setting with the other approach, because if you set it too high you can 
force the system to swap, while with the "file caching approach" the OS 
will just evict some of the cached pages to make room for sorts etc.

Memory driven approach:
In this approach you want to create a large value for shared_buffers.
You are relying on shared_buffers to hold the most commonly accessed
disk blocks.  The value for effective_cache_size would be relatively
small since you are not relying on the OS to cache files.

effective_cache_size should be set to the estimated amount of memory 
available for caching, *including* shared_buffers. So it should be set 
to a similar value in both approaches.

 This seems
like it would be the ideal situation if you have other applications
running on the box.

Actually it's the opposite. If there's other applications competing for 
the memory, it's better to let the OS manage the cache because it can 
make decisions on which pages to keep in cache and which to evict across 
all applications.

 By setting shared_buffers to a high value you are
guaranteeing memory available to PostgreSQL (this assumes the other
applications did not suck up to much memory to make your OS use virtual
memory).  

You're guaranteeing memory available to PostgreSQL, at the cost of said 
memory being unavailable from other applications. Or as you point out, 
in the worst case you end up swapping.

Do I understand the possible optimization paths correctly?  The only
question I have about this approach is: if I use the "memory driven
approach" since effective_cache_size would be small I would assume I
would need to fiddle with random_page_cost since there would be know way
for PostgreSQL to know I have a well configured system.

I don't see how effective_cache_size or the other settings affect 
random_page_cost. random_page_cost should mostly depend on your I/O 
hardware, though I think it's common practice to lower it when your 
database is small enough to fit mostly or completely in cache on the 
grounds that random access in memory is almost as fast as sequential access.

If everything I said is correct then I agree "Why have
effective_cache_size?"  Why not just go down the approach that Oracle
has taken and require people to rely more on shared_buffers and the
general memory driven approach?  Why rely on the disk caching of the OS?
Memory is only getting cheaper.

That has been discussed before many times, search the archives on direct 
I/O for previous flamewars on that subject. In a nutshell, we rely on 
the OS to not only do caching for us, but I/O scheduling and readahead 
as well. That saves us a lot of code, and the OS is in a better position 
to do that as well, because it knows the I/O hardware and disk layout so 
that it can issue the I/O requests in the most efficient way.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com