Create table from view, large data - out of memory (postgresql 8.2.0)

Peter Petrov <peter@xxxxxxxxxx> · Mon, 26 Mar 2007 13:02:46 +0300

Hi all,

PostgreSQL version: 8.2.0 on Linux xeonito 2.6.19.3 #1 SMP Mon Feb 12 
18:57:16 EET 2007 i686 i686 i386 GNU/Linux
Slackware 11.0.0

Memory parameters from postgresql.conf:
shared_buffers = 512MB
work_mem = 128MB
maintenance_work_mem = 512MB
max_fsm_pages = 1638400
effective_cache_size = 2200MB

cat /proc/sys/kernel/shmmax - 4000000000

Memory parameters are smaller, because I want to avoid "out of memory" 
if there are a few parallel queries.

I'm trying to create a table as select data from a view (prototype). 
The  prototype view uses some data, aggregated by another view (which 
combine (UNION ALL) data: ~ 14 GB  - 3 tables, ~ 15 millions rows).

CREATE TABLE  aa_clusters_raw AS SELECT * FROM clusters_raw_proto;
ERROR:  out of memory
DETAIL:  Failed on request of size 194.

And explain output:
EXPLAIN SELECT * FROM clusters_raw_proto;
                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
HashAggregate  (cost=6522535.23..6523961.81 rows=19021 width=340)
  ->  Hash Join  (cost=6517053.40..6520202.32 rows=54892 width=340)
        Hash Cond: (b.bts_sub = clusters.bts)
        ->  HashAggregate  (cost=6517032.51..6518932.51 rows=40000 
width=488)
              ->  Subquery Scan allbts_new_proto  
(cost=2016810.08..6461438.07 rows=1389861 width=488)
                    ->  Append  (cost=2016810.08..6444064.81 
rows=1389861 width=496)
                          ->  Subquery Scan "*SELECT* 1"  
(cost=2016810.08..2421192.58 rows=431342 width=496)
                                ->  GroupAggregate  
(cost=2016810.08..2416879.16 rows=431342 width=496)
                                      ->  Sort  
(cost=2016810.08..2027593.61 rows=4313411 width=496)
                                            Sort Key: cell, 
"timestamp", bsc
                                            ->  Seq Scan on nbts_raw  
(cost=0.00..575789.11 rows=4313411 width=496)
                          ->  Subquery Scan "*SELECT* 2"  
(cost=2211725.24..2689544.91 rows=888967 width=118)
                                ->  GroupAggregate  
(cost=2211725.24..2680655.24 rows=888967 width=118)
                                      ->  Sort  
(cost=2211725.24..2233949.41 rows=8889668 width=118)
                                            Sort Key: cell, 
"timestamp", bsc
                                            ->  Seq Scan on bts  
(cost=0.00..623575.68 rows=8889668 width=118)
                          ->  Subquery Scan "*SELECT* 3"  
(cost=1260555.90..1333327.33 rows=69552 width=180)
                                ->  GroupAggregate  
(cost=1260555.90..1332631.81 rows=69552 width=180)
                                      ->  Merge Join  
(cost=1260555.90..1322504.14 rows=175098 width=180)
                                            Merge Cond: ((a.cell = 
b.cell) AND (a."timestamp" = b."timestamp") AND (a.bsc = b.bsc))
                                            ->  Sort  
(cost=127343.39..129082.16 rows=695510 width=151)
                                                  Sort Key: a.cell, 
a."timestamp", a.bsc
                                                  ->  Seq Scan on 
ebts_raw a  (cost=0.00..59852.10 rows=695510 width=151)
                                            ->  Sort  
(cost=1133212.51..1146524.33 rows=5324730 width=59)
                                                  Sort Key: b.cell, 
b."timestamp", b.bsc
                                                  ->  Seq Scan on ebts 
b  (cost=0.00..338126.30 rows=5324730 width=59)
        ->  Hash  (cost=18.51..18.51 rows=951 width=22)
              ->  Seq Scan on clusters  (cost=0.00..18.51 rows=951 
width=22)

The result of the new table should be less than 100 MB, because of 
aggregation.

Is there a way to tell postgre to start writing the new table, before 
finish all "calculations" (something like automatic using of cursors) ?

I know it's possible by splitting the data using WHERE, but is there 
another method ?

If I split the 3 "big" tables into smaller pieces ... for example on 
monthly based storing of data and than I'm using a aggregation VIEW with 
constraints (timestamp)... this operation will be faster, but I'm not so 
sure if it will be again possible to create just one table, combining 
all results.

Any suggestions are welcome :)

Regards,
Peter Petrov