Carlo Stonebanks wrote: >> are you using the 'copy' interface? > > Straightforward inserts - the import data has to transformed, normalised and > de-duped by the import program. I imagine the copy interface is for more > straightforward data importing. These are - buy necessity - single row > inserts. > I know this is an answer to a question you didn't ask, but here it is. I was once doing stuff where I processed log files and had to do many lookups to normalize the data before insertion. I started out doing everything in SQL and using postgresql tables and it took a little over 24 hours to process 24 hours worth of data. Like you, it was single process, many lookups. I found a better way. I rewrote it (once in c#, again in python) and used hashtables/dictionaries instead of tables for the lookup data. For example, I'd start by loading the data into hash tables (yes, this took a *lot* of ram) then for each row I did something like: 1. is it in the hash table? 1. If not, insert it into the db 1. Insert it into the hashtable 2. Get the lookup field out of the hash table 3. Output normalized data This allow me to create text files containing the data in COPY format which can then be inserted into the database at dramatically increased speeds. My first version in C# (mono) cut the time down to 6 hours for 24 hours worth of data. I tweaked the algorithms and rewrote it in Python and got it down to 45 min. (Python can't take all the credit for the performance boost, I used an improved technique that could have been done in C# as well) This time included the time needed to do the copy and update the indexes. I created a version that also used gdb databases instead of hash tables. It increased the time from 45 min to a little over an hour but decreased the memory usage to something like 45MB (vs dozens or hundreds of MB per hashtable) -- Matthew Nuzum newz2000 on freenode