Re: Performace Optimization for Dummies

"Merlin Moncure" <mmoncure@xxxxxxxxx> · Thu, 28 Sep 2006 16:55:57 -0400

On 9/28/06, Carlo Stonebanks <stonec.register@xxxxxxxxxxxx> wrote:
The deduplication process requires so many programmed procedures that it
runs on the client. Most of the de-dupe lookups are not "straight" lookups,
but calculated ones emplying fuzzy logic. This is because we cannot dictate
the format of our input data and must deduplicate with what we get.

This was one of the reasons why I went with PostgreSQL in the first place,
because of the server-side programming options. However, I saw incredible
performance hits when running processes on the server and I partially
abandoned the idea (some custom-buiilt name-comparison functions still run
on the server).

imo, the key to high performance big data movements in postgresql is
mastering sql and pl/pgsql, especially the latter.  once you get good
at it, your net time of copy+plpgsql is going to be less than
insert+tcl.

merlin