Re: Performace Optimization for Dummies

"Jim C. Nasby" <jim@xxxxxxxxx> · Thu, 28 Sep 2006 18:52:43 -0500

On Thu, Sep 28, 2006 at 02:04:21PM -0700, Steve Atkins wrote:
> I think you're confusing "explain" and "analyze". "Explain" gives you
> human readable output as to what the planner decided to do with the
> query you give it.

Don't forget about EXPLAIN ANALYZE, which is related to EXPLAIN but has
nothing to do with the ANALYZE command.

> indexes. I don't know whether autovacuum will also analyze tables
> for you automagically, but it would be a good idea to analyze the table

It does.

> >>Talking of which, are there indexes on the table? Normally you
> >>wouldn't have indexes in place during a bulk import, but if you're
> >>doing selects as part of the data load process then you'd be forcing
> >>sequential scans for every query, which would explain why it gets
> >>slower as the table gets bigger.
> >
> >There are indexes for every obvious "where this = that" clauses. I  
> >don't
> >believe that they will work for ilike expressions.
> 
> If you're doing a lot of "where foo ilike 'bar%'" queries, with the  
> pattern
> anchored to the left you might want to look at using a functional index
> on lower(foo) and rewriting the query to look like "where lower(foo)  
> like
> lower('bar%')".
> 
> Similarly if you have many queries where the pattern is anchored
> at the right of the string then a functional index on the reverse of the
> string can be useful.

tsearch might prove helpful... I'm not sure how it handles substrings.

Something else to consider... databases love doing bulk operations. It
might be useful to load prospective data into a temporary table, and
then do as many operations as you can locally (ie: within the database)
on that table, hopefully eleminating as many candidate rows as possible
along the way.

I also suspect that running multiple merge processes at once would help.
Right now, your workload looks something like this:

client sends query      database is idle
client is idle          database runs query
client gets query back  database is idle

Oversimplification, but you get the point. There's a lot of time spent
waiting on each side. If the import code is running on the server, you
should probably run one import process per CPU. If it's on an external
server, 2 per CPU would probably be better (and that might be faster
than running local on the server at that point).
-- 
Jim Nasby                                            jim@xxxxxxxxx
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)