Re: Continual Postgres headaches...

Erik Jones <erik@xxxxxxxxxx> · Thu, 6 Dec 2007 14:27:07 -0600

On Dec 6, 2007, at 1:22 PM, Weber, Geoffrey M. wrote:

I've been trying for quite a while to get Postgresql tuned for use  
as an OLTP system.  I have several PL/pgSQL functions that handle  
inserts and updates to the main table and several near-real-time  
daemons written that access the data and can take automated actions  
on it (email/page concerned people, get complimentary information  
from a different system, etc.).  I started with Postgres 8.1 and am  
now using 8.2.4 (and have been since its release).  I'll try to  
provide enough information for a decent response, but as I can't  
obviously put my entire schema and database out there, I'm hoping  
that I can get some decent guidelines beyond that what I've found  
though Google, etc. to get this thing tuned better.

Most of the data centers in on a central table and has 23 columns,  
1 constraint, and 9 indexes.  4 of the indexes are partial.  The  
table usually contains about 3-4 million rows, but I've cut it down  
to 1.2 million (cut out 2/3 of the data) in an effort to migrate  
the database to a 2nd sever for more testing.  The two partial  
indexes used the most: 242MB accessed nearly constantly, and 15MB  
accessed every 5 seconds - but also updated constantly via inserts  
using the 242MB index.  Other than one other 25MB index, the others  
seem to average around 300MB each, but these aren't used quite as  
often (usually about every minute or so).
Do you really have that many query patterns that you need 9 different  
indexes on the table?  Without seeing the  table layout I'm  
suspicious that with 23 column table with 9 indexes it could either  
do with some normalization, vertical partitioning or less indexes.   
There is definitely such a thing as too many indexes, especially if  
any of them share indexed columns.
 My problems really are with performance consistency.  I have  
tweaked the execution so that everything should run with sub-second  
execution times, but even after everything is running well, I can  
get at most a week or two of steady running before things start to  
degrade.

In my efforts to correct these consistency in execution problems, I  
have gone from vacuuming (with analyze) twice a day to every 30  
minutes (how long it takes a vacuum analyze to run - another  
seeming problem because it shouldn't take so long?).
That really depends on the table size and the kind of traffic it sees.
I've done a VACUUM FULL and that sometimes helps, but obviously is  
not a long-term solution for a true OLTP system.  Why I said  
"sometimes helps": earlier this week I did a VACUUM FULL, VACUUM  
ANALYZE on the entire schema. A query that was supposed to use the  
25MB index above (that matched the parameters of the partial index  
exactly) was still not doing so.  I had to DROP and re-CREATE the  
index (and do another ANALYZE) to get the database to (correctly)  
use the index.
VACUUM FULL rewrites large portions of your table in order to compact  
space.  Since you're physically moving the tuples around it requires  
updating the indexes on the table and thus can be a fast track to  
bloated indexes. The VACUUM FULL docs even point this out and  
suggests running REINDEX after a VACUUM FULL.  You do realize that we  
have a REINDEX command, right?  No need to drop and recreate  
separately when they are consecutive actions, just use REINDEX.

Another problem: sometimes I get these unexplainable delays in  
using a 'stored procedure' even though all its SQL calls run just  
fine individually.  For example, calling a particular function will  
take 1 minute to execute even though entering the SQL commands  
individually through psql will total up to about 1 second.  When I  
log 'long-duration queries', I only see the top function call in  
the log.  When I put 'RAISE NOTICE' statements in the various PL/ 
pgSQL functions, I only see them displayed for the function I call  
directly; any underlying functions called from that function does  
not show any of their NOTICE statements.  Because of this I can't  
track down where the delays are occurring for me to try and correct  
the problem (not that there should be one in the first place)!  I  
should also note that there is not any lock contention (double- 
checked with pgadmin3) showing that would "artificially" delay the  
queries...

I've played with quite a few of the parameters like  
vacuum_cost_delay, work_mem, shared_buffers, turned fsync off, etc.  
in the postgresql.conf file.  The server I've been given to use is  
quite old (read: not the fastest anymore), but normally shows a  
load average of 0.25.to 0.50 or so.

I guess my questions boil down to what I can do other than use the  
EXPLAIN, EXPLAIN ANALYZE, VACUUM ANALYZE, etc. to get my schema  
performing acceptably across the board in a consistent manner.  If  
there is more information that I can give out that would improve  
responses, please let me know.
As Merlin said, it would help to see some of those query plans along  
with those tables' definitions.  In addition, your memory and disk/I/ 
O related config setting along with the system specs would help  
alot.  Without concrete examples and numbers, the answers to all of  
your questions are "It depends -- on those concrete examples and  
numbers."

Erik Jones

Software Developer | Emma®
erik@xxxxxxxxxx
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq