Hi Arup On Sun, May 24, 2015 at 12:26 PM, Arup Rakshit <aruprakshit@xxxxxxxxxxxxxx> wrote: > I am copying the data from a CSV file to a Table using "COPY" command. But one thing that I got stuck, is how to skip duplicate records while copying from CSV to tables. By looking at the documentation, it seems, Postgresql don't have any inbuilt too to handle this with "copy" command. By doing Google I got below 1 idea to use temp table. > http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql > I am also thinking what if I let the records get inserted, and then delete the duplicate records from table as this post suggested - http://www.postgresql.org/message-id/37013500.DFF0A64A@xxxxxxxxxxxxxxxxxxxx. > Both of the solution looks like doing double work. But I am not sure which is the best solution here. Can anybody suggest which approach should I adopt ? Or if any better ideas you guys have on this task, please share. Without looking at your SO answers ( I suppose both consist on copying to a temp table, then either doing an 'insert into main table select * from temp where not exists...' or a delete from temp / insert into main , which is what I would suggest ) I would point out copy is not the tool for this kind of jobs. Copy is normally used for bulk loading of correct data, and is great at this. If you need to preclean data, you are not doing double work using one of those solutions. Many ways of doing it have been pointed, the exact one depends on your data. You've pointed there are duplicate rows, if this is true you can easily do that using text tools and dumps or the temporary table. There is also the solution of just turning your CSV with your favorite text handiling tool into a set of insert lines conditioned of inexistence of an appropiately indexed set of fields in the table ( which I've done but cannot quote exactly, it was an experiment and got quite hairy ). From my experience, go for the temporary table plus insert-select/delete+select route, it's easier, is normally faster ( supposing you have appropiate indexes ) and it really is no double work. You have dirty data ( duplicates ), you have to clean it before inserting. Or, you could turn the file into a series of plain inserts and feed it to psql in autocommit mode, if you have some unique condition on the table, so it errors out on every duplicate. It's not pretty but should work. I still recommend the temp table approach. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general