On 05/24/2015 04:55 AM, Arup Rakshit wrote:
On Sunday, May 24, 2015 02:52:47 PM you wrote:
On Sun, 2015-05-24 at 16:56 +0630, Arup Rakshit wrote:
Hi,
I am copying the data from a CSV file to a Table using "COPY" command.
But one thing that I got stuck, is how to skip duplicate records while
copying from CSV to tables. By looking at the documentation, it seems,
Postgresql don't have any inbuilt too to handle this with "copy"
command. By doing Google I got below 1 idea to use temp table.
http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql
I am also thinking what if I let the records get inserted, and then
delete the duplicate records from table as this post suggested -
http://www.postgresql.org/message-id/37013500.DFF0A64A@xxxxxxxxxxxxxxxxxxxx.
Both of the solution looks like doing double work. But I am not sure
which is the best solution here. Can anybody suggest which approach
should I adopt ? Or if any better ideas you guys have on this task,
please share.
Assuming you are using Unix, or can install Unix tools, run the input
files through
sort -u
before passing them to COPY.
Oliver Elphick
I think I need to ask more specific way. I have a table say `table1`, where I feed data from different CSV files. Now suppose I have inserted N records to my table `table1` from csv file `c1`. This is ok, next time when again I am importing from a different CSV file say `c2` to `table1`, I just don't want reinsert any record from this new CSV file to table `table1`, if the current CSV data already table has.
How to do this?
As others have pointed out this depends on what you are considering a
duplicate.
Is it if the entire row is duplicated?
Or if some portion of the row(a 'primary key') is duplicated?
My SO link is not a solution to my problem I see now.
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general