I have just completed the bulk upload of a large database. Some tables have billions of records and no constraints or indexes have been applied yet. About 0.1% of these records may have been duplicated during the upload and I need to remove them before applying constraints. I understand there are (at least) two approaches to get a table without duplicate records… - Delete duplicate records from the table based on an appropriate select clause; - Create a new table with the results from a select distinct clause, and then drop the original table. What would be the most efficient procedure in PostgreSQL to do the job considering … - I do not know which records were duplicated; - There are no indexes applied on tables yet; - There is no OIDS on tables yet; - The database is currently 1TB but I have plenty of disk space. Daniel |