PostgreSQL 8.4.8 on CentOS 5.6, x86_64. Default settings except work_mem = 1MB. NOTE: I am using partitioned tables here, and was querying the 'master' table. Perhaps is this a Known Issue. I ran a query recently where the result was very large. The outer-most part of the query looked like this: HashAggregate (cost=56886512.96..56886514.96 rows=200 width=30) -> Result (cost=0.00..50842760.97 rows=2417500797 width=30) The row count for 'Result' is in the right ballpark, but why does HashAggregate think that it can turn 2 *billion* rows of strings (an average of 30 bytes long) into only 200? This is my primary concern. If I don't disable hash aggregation, postgresql quickly consumes huge quantities of memory and eventually gets killed by the OOM manager. After manually disabling hash aggregation, I ran the same query. It's been running for over 2 days now. The disk is busy but actual data transferred is very low. Total data size is approx. 250GB, perhaps a bit less. The query scans 160 or so tables for data. If I use a distinct + union on each table, the plan looks like this: Unique (cost=357204094.44..357318730.75 rows=22927263 width=28) -> Sort (cost=357204094.44..357261412.59 rows=22927263 width=28) 23 million rows is more like it, and the cost is much lower. What is the possibility that distinct/unique operations can be pushed "down" into queries during the planning stage to see if they are less expensive? In this case, postgresql believes (probably correctly, I'll let you know) that distinct(column foo from tableA + column foo from tableB + column foo from tableC ...) is more expensive than distinct(distinct column foo from tableA + distinct column foo from tableB .... ). -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance