Moving from narrow to wide tables

Randall Skelton <skelton@brutus.uwaterloo.ca> · Wed, 7 Apr 2004 18:28:58 -0400

Hi all,

I have begun the slow process of shuffling data from about 125 narrow 
tables into a single wide table and I am seeking some insight on the 
'fastest way.'  The narrow tables are all of the following 
configuration:

===
              Table "cal_0800_time"
  Column   |           Type           | Modifiers
-----------+--------------------------+-----------
 timestamp | timestamp with time zone |
 value     | double precision         |
Indexes: cal_0805_time__timestamp

              Table "cal_5v_dig_rl"
  Column   |           Type           | Modifiers
-----------+--------------------------+-----------
 timestamp | timestamp with time zone |
 value     | double precision         |
Indexes: cal_5v_dig_rl__timestamp
===

As it stands, the timestamps are not unique but they should be in the 
new table.  I envision something like:

===
CREATE TABLE app_id_800 (
  timestamp timestamp with time zone PRIMARY KEY
    CHECK (timestamp BETWEEN '2003-08-13 02:10:00 +0' AND now()),
  cal_5v_dig_rl   float,
  ...
);
===

Followed by:

===

newtelemetry=> EXPLAIN ANALYZE INSERT INTO app_id_800(timestamp) SELECT 
DISTINCT timestamp FROM cal_0800_time WHERE timestamp BETWEEN 
'2004-02-21 0:00:00 +0' AND '2004-02-21 12:00:00 +0';NOTICE:  QUERY 
PLAN:

Subquery Scan *SELECT*  (cost=0.00..11542.36 rows=1134 width=8) (actual 
time=0.50..1786.02 rows=36219 loops=1)

  ->  Unique  (cost=0.00..11542.36 rows=1134 width=8) (actual 
time=0.47..907.77 rows=36219 loops=1)

        ->  Index Scan using cal_0800_time__timestamp on cal_0800_time  
(cost=0.00..11514.01 rows=11341 width=8) (actual time=0.46..812.19 
rows=37920 loops=1)

Total runtime: 23162.90 msec

EXPLAIN

newtelemetry=> EXPLAIN ANALYZE UPDATE app_id_800 SET cal_ccd_temp = 
cal_ccd_temp.value FROM cal_ccd_temp WHERE app_id_800.timestamp BETWEEN 
'2004-02-21 00:00:00 +0' AND '2004-02-21 12:00:00 +0' AND 
app_id_800.timestamp = cal_ccd_temp.timestamp;

NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..6.89 rows=1 width=538) (actual 
time=1.34..5215.49 rows=37920 loops=1)

  ->  Index Scan using app_id_800_pkey on app_id_800  (cost=0.00..3.02 
rows=1 width=522) (actual time=0.82..1727.18 rows=36219 loops=1)

  ->  Index Scan using cal_ccd_temp__timestamp on cal_ccd_temp  
(cost=0.00..3.86 rows=1 width=16) (actual time=0.04..0.05 rows=1 
loops=36219)

Total runtime: 33979.31 msec

EXPLAIN

... 125 more UPDATE app_id_800 SET commands ...
===

The trouble is that this is taking a very long time when the time 
interval increases.  The total time for the one insert and 125 updates 
(as above) is about 2-4 hrs for 1 day of data (~80K rows).

===

newtelemetry=> EXPLAIN ANALYZE INSERT INTO app_id_800(timestamp) SELECT 
DISTINCT timestamp FROM cal_0800_time WHERE timestamp BETWEEN 
'2004-02-21 0:00:00 +0' AND '2004-02-22 00:00:00 +0';

NOTICE:  QUERY PLAN:

Subquery Scan *SELECT*  (cost=0.00..40791.96 rows=4013 width=8) (actual 
time=0.89..4397.78 rows=72448 loops=1)

  ->  Unique  (cost=0.00..40791.96 rows=4013 width=8) (actual 
time=0.85..2614.95 rows=72448 loops=1)

        ->  Index Scan using cal_0800_time__timestamp on cal_0800_time  
(cost=0.00..40691.63 rows=40130 width=8) (actual time=0.85..2399.50 
rows=101072 loops=1)

Total runtime: 55945.59 msec

EXPLAIN

newtelemetry=> EXPLAIN ANALYZE UPDATE app_id_800 SET cal_ccd_temp = 
cal_ccd_temp.value FROM cal_ccd_temp WHERE app_id_800.timestamp BETWEEN 
'2004-02-21 00:00:00 +0' AND '2004-02-22 00:00:00 +0' AND 
app_id_800.timestamp = cal_ccd_temp.timestamp;

NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..6.89 rows=1 width=538) (actual 
time=1.08..13235.47 rows=101072 loops=1)

  ->  Index Scan using app_id_800_pkey on app_id_800  (cost=0.00..3.02 
rows=1 width=522) (actual time=0.55..3647.76 rows=72448 loops=1)

  ->  Index Scan using cal_ccd_temp__timestamp on cal_ccd_temp  
(cost=0.00..3.86 rows=1 width=16) (actual time=0.05..0.07 rows=1 
loops=72448)

Total runtime: 68472.13 msec

EXPLAIN
===

Any ideas on wow can I speed this along?  I have 4 months of data an 
counting :(

Cheers,
Randall

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faqs/FAQ.html