2010/6/1 Torsten Zühlsdorff <foo@xxxxxxxxxxxxxxxxxxx>: > Hello, > > i have a set of unique data which about 150.000.000 rows. Regullary i get a > list of data, which contains multiple times of rows than the already stored > one. Often around 2.000.000.000 rows. Within this rows are many duplicates > and often the set of already stored data. > I want to store just every entry, which is not within the already stored > one. Also i do not want to store duplicates. Example: > > Already stored set: > a,b,c > > Given set: > a,b,a,c,d,a,c,d,b > > Expected set after import: > a,b,c,d > > I now looking for a faster way for the import. At the moment i import the > new data with copy into an table 'import'. then i remove the duplicates and > insert every row which is not already known. after that import is truncated. > > Is there a faster way? Should i just insert every row and ignore it, if the > unique constrain fails? > > Here the simplified table-schema. in real life it's with partitions: > test=# \d urls > Tabelle »public.urls« > Spalte | Typ | Attribute > --------+---------+------------------------------------------------------- > url_id | integer | not null default nextval('urls_url_id_seq'::regclass) > url | text | not null > Indexe: > »urls_url« UNIQUE, btree (url) > »urls_url_id« btree (url_id) > > Thanks for every hint or advice! :) I think you need to have a look at pgloader. It does COPY with error handling. very effective. http://pgloader.projects.postgresql.org/ > > Greetings from Germany, > Torsten > -- > http://www.dddbl.de - ein Datenbank-Layer, der die Arbeit mit 8 > verschiedenen Datenbanksystemen abstrahiert, > Queries von Applikationen trennt und automatisch die Query-Ergebnisse > auswerten kann. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance