Hi,
I have
create table x ( att bigint, val bigint, hash varchar(30)
);
with 693million rows. The query
create table y as select att, val, count(*) as cnt from x
group by att, val;
ran for more than 2000 minutes and used 14g memory on an 8g physical
RAM machine -- eventually I stopped it. Doing
create table y ( att bigint, val bigint, cnt int );
and something a bit like: for i in `seq 0 255` | xargs -n 1
-P 6
psql -c "insert into y select att, val,
count(*) from x where att%256=$1 group by att, val" test
runs 6 out of 256 in 10 minutes -- meaning the whole problem can be
done in just under 3 hours.
Question 1: do you see any reason why the second method would yield a
different result from the first method?
Question 2: is that method generalisabl so that it could be included in
the base system without manual shell glue?
Thanks,
Oliver
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance