>> There is no true key, only an artificial key so I can ensure that rows are >> unique. That's in the main table with the 50K rows. No key column in the >> .csv file. If you have no true key then you have no way to ensure uniqueness. By adding an artificial key two records that are otherwise duplicates would now be considered unique. Since you have not given data/structure for either the CSV or Main Table more specific help is not possible but when using a serialized PK in almost every case the table should also have a candidate key with a UNIQUE index defined. If you cannot explain why yours does not, and why it cannot, I would offer that you need to gain further understanding of your data model. It is generally wise to create a UNIQUE index on a candidate key and risk being wrong. At least you will be given an actual error and, in the worst case, can always drop the UNIQUE index if indeed the "duplicate" record should be valid; though in that situation you now have more data to input into you model analysis and should be able to correctly modify the table to create a new candidate key. Slight tangent but I have an external accounting source where I know that, with respect to the available data, duplicates can occur (a PK field is not available). Since I have no candidate key I am forced to use an artificial (serial) key and take extra precautions to ensure I do not inadvertently introduce unintentional duplicate data during import. In my case I handle data at the "day" level. My source gives me every transaction for a given date and I then modify my live table to add only the correct number of records so that, after the merge process, I have an exact duplicate of the data in the source file. Thus, since I trust the source file (and cannot enter data via any other method), I know immediately after processing that any duplicates on a given date are expected duplicates as opposed to, say, me accidentally importing the same file twice and thus having twice as many records. I also understand that if, say for reconciliation purposes, I need to choose one of a duplicate record it does not matter, initially, which one I choose but afterwards, if I only add records, I can ensure that I always pick the same record in the future. However, if I am forced to "DELETE" a record, from a practical perspective I DELETE BOTH/ALL of the records and then ADD back the correct number of records for that date. Any data that cared about the original records will now need to decide how to handle the fact that their record may no longer be present (instead of deleting only some of the existing records at random without knowing which ones are the "correct" ones to delete). This is one example I've come across where the data I am working with has absolutely NO inherent PK that I can see but where I can trust that, for a given dataset, I only have valid data. I did have to assign a SERIAL PK to my copy of the data but I also recognized where problems could occur and mitigated them via specific processing routines. One alternative solution would be to simply DELETE everything for a given date and then import every record from the source file into the main table. I rejected that solution because I could not afford to continually delete the existing records as other tables claimed FK relationships to them and continually breaking (ON DELETE SET NULL) them was unacceptable. I still have to do so when I need to delete a record (rare given this is accounting data) but simply adding a new record does not affect existing records. Whether this situation mirrors yours I do not know but I hope this brief description is at least informative and educational for you and others. Feedback/Thoughts are greatly welcomed. >> I presume what you call a staging table is what I refer to as a copy of >> the main table, but with no key attribute. >> Writing the SELECT statement to delete from the staging table those rows >> that already exist in the main table is where I'm open to suggestions. The big question to ask is how you would be able to identify a record in the CSV file as already being on the main table (either directly or, as my above example, indirectly)? My use of "staging table" reflects the fact that the structure of the table should roughly match the CSV file and NOT the "main table". The SQL you issue to move records from the staging table to the main table will then account for any differences between the two. The general idea is to load up the staging table, optionally update "matching" records on the main table, insert non-matching records, then truncate/clear the staging table. The general structure for the insert would be: INSERT INTO maintable (cols) SELECT cols FROM staging WHERE staging.idcols NOT IN (SELECT maintable.idcols FROM maintable); There may be more efficient ways to write the query but the idea is the same. David J. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general