Re: Insert into on conflict, data size upto 3 billion records

Rob Sargent <robjsargent@xxxxxxxxx> · Mon, 15 Feb 2021 12:05:57 -0700

On 2/15/21 11:41 AM, Karthik K wrote:
exactly, for now, what I did was, as the table is already partitioned, I 
created 50 different connections and tried updating the target table by 
directly querying from the source partition tables. Are there any other 
techniques that I can use to speed this up? also when we use on conflict 
statement for both insert and update does Postgres uses batching 
internally (committing for every 10000 records etc) or will it update 
all records at once, in that case, does it create a version for each 
record and do swap all at once? I'm wondering how atomicity is 
guaranteed, also if I have to do batching other than selecting from 
individual partitions does doing it batches of 10000 records help?

What is your ratio of inserts versus update?  Can you separate the 
inserts and updates?  Is the target table indexed other than on primary 
key? If so can they be dropped?

Assuming you use \copy to load the batch tables

I've found this strategy to be effective:
index batch on id

--update first
begin
update target t set "all fields" from batch b where t.id = b.id and b.id 
between "hi" and "low"
commit
increment hi low, avoid overlap; repeat

--insert
begin;
insert into target as select b.* from from batch b where not exists 
(select 1 from target v where b.id = v.id) and b.id between "hi" and "low"
commit
increment hi, low, avoid overlap; repeat