Search Postgresql Archives

Re: COPY from .csv File and Remove Duplicates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 11 Aug 2011, David Johnston wrote:

If you have duplicates with matching real keys inserting into a staging
table and then moving new records to the final table is your best option
(in general it is better to do a two-step with a staging table since you
can readily use Postgresql to perform any intermediate translations) As
for the import itself,

  It was probably a couple of days extracting very messy data from Excel
spreadsheets and writing python and awk scripts to transform them that
caused me to miss the obvious: the multi-column primary key that I intended
to implement in the base table.

  Trying to add a compound primary key using (loc_name, sample_date, param)
shows there are duplicates in the original data. While there are many slight
variations on the SELECT syntax for finding duplicates based on a single
column, I've not found working syntax for finding duplicate rows based on
the values in all three columns.

  A pointer to the appropriate syntax for retrieving the entire row when
count(loc_name, sample_date, param) > 1 would be much appreciated.

Rich

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux