With Postgresql 8.1.9 -- I have a simple group by query:
SELECT target_page_id, min(created_at)
FROM page_page_link
GROUP BY 1;
The page_page_link table has ~130 million rows.
After analyzing the table, the planner picks a hash aggregate plan, which results in an out of memory error.
crystal=> analyze page_page_link;
ANALYZE
crystal=> explain
crystal-> SELECT target_page_id as page_id, min(created_at) as created_at
crystal-> FROM page_page_link
crystal-> GROUP By 1
crystal-> ;
QUERY PLAN
-----------------------------------------------------------------------------------
HashAggregate (cost=3663517.88..3670393.09
rows=550017 width=12)
-> Seq Scan on page_page_link (cost=0.00..2993649.92 rows=133973592 width=12)
(2 rows)
The default_statistics_target was originally 200.
I upped it to 1000 and still get the same results.
crystal=> show default_statistics_target;
default_statistics_target
---------------------------
1000
(1 row)
crystal=> set enable_hashagg = off;
SET
crystal=> explain
crystal-> SELECT target_page_id as page_id, min(created_at) as created_at
crystal-> FROM page_page_link
crystal-> GROUP BY 1
crystal-> ;
QUERY PLAN
-----------------------------------------------------------------------------------------
GroupAggregate (cost=27240841.37..28252518.53 rows=550017 width=12)
-> Sort (cost=27240841.37..27575775.35 rows=133973592 width=12)
Sort Key: target_page_id
-> Seq Scan on page_page_link (cost=
0.00..2993649.92 rows=133973592 width=12)
(4 rows)
crystal=>
I am working around this by setting enable_hashagg = off -- but it just seems like a case where the planner is not picking the strategy?
Is there another setting I can change to help make better decisions?
thanks in advance,
Mason