On Thu, 11 Aug 2011, David Johnston wrote:
If you have duplicates with matching real keys inserting into a staging table and then moving new records to the final table is your best option (in general it is better to do a two-step with a staging table since you can readily use Postgresql to perform any intermediate translations) As for the import itself,
It was probably a couple of days extracting very messy data from Excel spreadsheets and writing python and awk scripts to transform them that caused me to miss the obvious: the multi-column primary key that I intended to implement in the base table. Trying to add a compound primary key using (loc_name, sample_date, param) shows there are duplicates in the original data. While there are many slight variations on the SELECT syntax for finding duplicates based on a single column, I've not found working syntax for finding duplicate rows based on the values in all three columns. A pointer to the appropriate syntax for retrieving the entire row when count(loc_name, sample_date, param) > 1 would be much appreciated. Rich -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general