[TLM] Re: How to insert on duplicate key?

Greg Smith <gsmith@xxxxxxxxxxxxx> · Tue, 25 Dec 2007 02:15:15 -0500 (EST)

On Tue, 25 Dec 2007, fdu.xiaojf@xxxxxxxxx wrote:

insert a record into a table, and when the record already 
exists(according to the primary key), update it.

There is an example that does exactly that, 37-1, in the documentation at 
http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html 
It actually does the update first and only if that fails does the insert, 
which avoids the whole duplicate key issue altogether.

I have tried the query and update/insert way, and it was very slow when 
more than 1 million records have been inserted. (I have more than 20 
million records to insert.)

This may be better because it isn't doing the query first.  You may 
discover that you need to aggressively run one of the VACUUM processes 
(I'd guess regular and ANALYZE but not FULL) in order to keep performance 
steady as the number of records grows.  Anytime you update a row, that 
becomes a dead row that's still taking up space, and if you do a lot of 
those they get in the way of finding the rows that are still live.  Take a 
look at 
http://www.postgresql.org/docs/current/interactive/routine-vacuuming.html 
to get an idea of the process.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org/