On Thu, Sep 28, 2006 at 02:04:21PM -0700, Steve Atkins wrote: > I think you're confusing "explain" and "analyze". "Explain" gives you > human readable output as to what the planner decided to do with the > query you give it. Don't forget about EXPLAIN ANALYZE, which is related to EXPLAIN but has nothing to do with the ANALYZE command. > indexes. I don't know whether autovacuum will also analyze tables > for you automagically, but it would be a good idea to analyze the table It does. > >>Talking of which, are there indexes on the table? Normally you > >>wouldn't have indexes in place during a bulk import, but if you're > >>doing selects as part of the data load process then you'd be forcing > >>sequential scans for every query, which would explain why it gets > >>slower as the table gets bigger. > > > >There are indexes for every obvious "where this = that" clauses. I > >don't > >believe that they will work for ilike expressions. > > If you're doing a lot of "where foo ilike 'bar%'" queries, with the > pattern > anchored to the left you might want to look at using a functional index > on lower(foo) and rewriting the query to look like "where lower(foo) > like > lower('bar%')". > > Similarly if you have many queries where the pattern is anchored > at the right of the string then a functional index on the reverse of the > string can be useful. tsearch might prove helpful... I'm not sure how it handles substrings. Something else to consider... databases love doing bulk operations. It might be useful to load prospective data into a temporary table, and then do as many operations as you can locally (ie: within the database) on that table, hopefully eleminating as many candidate rows as possible along the way. I also suspect that running multiple merge processes at once would help. Right now, your workload looks something like this: client sends query database is idle client is idle database runs query client gets query back database is idle Oversimplification, but you get the point. There's a lot of time spent waiting on each side. If the import code is running on the server, you should probably run one import process per CPU. If it's on an external server, 2 per CPU would probably be better (and that might be faster than running local on the server at that point). -- Jim Nasby jim@xxxxxxxxx EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)