On Sep 28, 2006, at 12:10 PM, Carlo Stonebanks wrote:
Are you wrapping all this in a transaction?
Yes, the transactions can typically wrap 1 to 10 single-table,
single-row
inserts and updates.
You're doing some dynamically generated selects as part of the
"de-duping" process? They're probably the expensive bit. What
do those queries tend to look like?
Without a doubt, this is the expensive bit.
If you could give some samples of those queries here I suspect
people could be a lot more helpful with some optimisations, or
at least pinpoint where the performance issues are likely to be.
Are you analysing the table periodically? If not, then you might
have statistics based on an empty table, or default statistics, which
might cause the planner to choose bad plans for those selects.
Now there's something I didn't know - I thought that analysis and
planning
was done with every select, and the performance benefit of prepared
statements was to plan-once, execute many. I can easily put in a
periodic
analyse statement. I obviously missed how to use analyze properluy, I
thought it was just human-readable output - do I understand
correctly, that
it can be used to get the SQL server to revaluate its plan based on
newer
statistics - even on non-prepared queries?
I think you're confusing "explain" and "analyze". "Explain" gives you
human readable output as to what the planner decided to do with the
query you give it.
"Analyze" samples the data in tables and stores the statistical
distribution
of the data, and estimates of table size and that sort of thing for the
planner to use to decide on a good query plan. You need to run
analyze when the statistics or size of a table has changed
significantly,
so as to give the planner the best chance of choosing an appropriate
plan.
If you're not running analyze occasionally then the planner will be
working on default stats or empty table stats and will tend to avoid
indexes. I don't know whether autovacuum will also analyze tables
for you automagically, but it would be a good idea to analyze the table
every so often, especially early on in the load - as the stats
gathered for
a small table will likely give painful performance once the table has
grown a lot.
Talking of which, are there indexes on the table? Normally you
wouldn't have indexes in place during a bulk import, but if you're
doing selects as part of the data load process then you'd be forcing
sequential scans for every query, which would explain why it gets
slower as the table gets bigger.
There are indexes for every obvious "where this = that" clauses. I
don't
believe that they will work for ilike expressions.
If you're doing a lot of "where foo ilike 'bar%'" queries, with the
pattern
anchored to the left you might want to look at using a functional index
on lower(foo) and rewriting the query to look like "where lower(foo)
like
lower('bar%')".
Similarly if you have many queries where the pattern is anchored
at the right of the string then a functional index on the reverse of the
string can be useful.
Cheers,
Steve