Search Postgresql Archives

Re: Efficiently move data from one table to another, with FK constraints?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Rob W wrote:
> I am using COPY to bulk load large volumes (i.e. multi GB 
> range) of data to a staging table in a PostgreSQL 8.3. For 
> performance, the staging table has no constraints, no primary 
> key, etc. I want to move that data into the "real" tables, 
> but need some advice on how to do that efficiently.
> 
> Here's a simple, abbreviated example of tables and relations 
> I'm working with (in reality there are a lot more columns and 
> foreign keys).
> 
> /* The raw bulk-loaded data. No indexes or constraints. */
> CREATE TABLE log_entry (
>     req_time TIMESTAMP NOT NULL,
>     url TEXT NOT NULL,
>     bytes INTEGER NOT NULL
> );
> 
> /* Where the data will be moved to. Will have indexes, etc */
> CREATE TABLE request (
>     id BIGSERIAL PRIMARY KEY,
>     req_time TIMESTAMP WITH TIME ZONE NOT NULL,
>     bytes INTEGER NOT NULL,
>     fk_url INTEGER REFERENCES url NOT NULL,
> );
> 
> CREATE TABLE url (
>     id SERIAL PRIMARY KEY,
>     path TEXT UNIQUE NOT NULL,
> );
> 
> Is there a way to move this data in bulk efficiently? 
> Specifically I'm wondering how to handle the foreign keys? 
> The naive approach is:
> 
> 1) For each column that is a foreign key in the target table, 
> do INSERT ... SELECT DISTINCT ...  to copy all the values 
> into the appropriate child tables.
> 2) For each row in log_entry,  do a similar insert to insert 
> the data with the appropriate  foreign keys.
> 3) delete the contents of table log_entry using TRUNCATE
> 
> Obviously, this would be very slow when handling tens of 
> millions of records. Are there faster approaches to solving 
> this problem?

How about something like that:

INSERT INTO url (path) (SELET DISTINCT url FROM log_entry);

Then

INSERT INTO request (req_time, bytes, fk_url)
   (SELECT l.req_time, l.bytes, u.id FROM log_entry AS l JOIN url AS u ON (l.url = u.path));

I didn't test it, so there may be syntax errors and stuff.

But I doubt that it can be done much more efficiently.

Creating an index on log_entry(url) *might* improve performance. Check with EXPLAIN.

The TRUNCATE should not be a very expensive operation.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux