Copy Bulk Ignore Duplicated

Leandro Guimarães <leo.guimaraes@xxxxxxxxx> · Fri, 14 Jun 2019 18:04:35 -0300

Hi, 
   I have a scenario with a large table and I'm trying to insert it via a COPY command with a csv file.

   Everything works, but sometimes my source .csv file has duplicated data in the previously fulfilled table. If I add a check constraint and try to run the COPY command I have an error that stops the whole insertion.

  I've tried to put the data in a tmp table and fill the main using distinct this way (the fields and names are just examples):

INSERT INTO final_table values (name, document)
   SELECT DISTINCT name, document
   FROM tmp_TABLE t1 
   WHERE NOT EXISTS ( 
   SELECT 1 FROM final_table t2 
   WHERE (t2.name, t2.document) 
   IS NOT DISTINCT FROM (t1.name, t1.document))

The problem is that my final_table is a large (and partitioned) table and this query is taking a long time to execute.

Someone have any idea (really guys anything would be great) how to solve this situation? I need to ignore duplicates instead to have some error.

I'm using PostgreSQL 9.4 so I can't use "ON CONFLICT" and upgrade is not an option.

Thanks and Kind Regards!

Leandro Guimarães