Rob W wrote: > I am using COPY to bulk load large volumes (i.e. multi GB > range) of data to a staging table in a PostgreSQL 8.3. For > performance, the staging table has no constraints, no primary > key, etc. I want to move that data into the "real" tables, > but need some advice on how to do that efficiently. > > Here's a simple, abbreviated example of tables and relations > I'm working with (in reality there are a lot more columns and > foreign keys). > > /* The raw bulk-loaded data. No indexes or constraints. */ > CREATE TABLE log_entry ( > req_time TIMESTAMP NOT NULL, > url TEXT NOT NULL, > bytes INTEGER NOT NULL > ); > > /* Where the data will be moved to. Will have indexes, etc */ > CREATE TABLE request ( > id BIGSERIAL PRIMARY KEY, > req_time TIMESTAMP WITH TIME ZONE NOT NULL, > bytes INTEGER NOT NULL, > fk_url INTEGER REFERENCES url NOT NULL, > ); > > CREATE TABLE url ( > id SERIAL PRIMARY KEY, > path TEXT UNIQUE NOT NULL, > ); > > Is there a way to move this data in bulk efficiently? > Specifically I'm wondering how to handle the foreign keys? > The naive approach is: > > 1) For each column that is a foreign key in the target table, > do INSERT ... SELECT DISTINCT ... to copy all the values > into the appropriate child tables. > 2) For each row in log_entry, do a similar insert to insert > the data with the appropriate foreign keys. > 3) delete the contents of table log_entry using TRUNCATE > > Obviously, this would be very slow when handling tens of > millions of records. Are there faster approaches to solving > this problem? How about something like that: INSERT INTO url (path) (SELET DISTINCT url FROM log_entry); Then INSERT INTO request (req_time, bytes, fk_url) (SELECT l.req_time, l.bytes, u.id FROM log_entry AS l JOIN url AS u ON (l.url = u.path)); I didn't test it, so there may be syntax errors and stuff. But I doubt that it can be done much more efficiently. Creating an index on log_entry(url) *might* improve performance. Check with EXPLAIN. The TRUNCATE should not be a very expensive operation. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general