Hi, I've noticed something that I find strange with the hash-aggregate feature of Postgres. I'm currently running Postgres v8.4.1 on Debian Linux 64-bit. I have a simple query that when planned either uses hash-aggregates or a sort depending on the amount of working memory available. The problem is that when it uses the hash-aggregates, the query runs 25% slower than when using the sort method. The table in question contains about 60 columns, many of which are boolean, 32-bit integers and some are 64-bit integers. Many fields are text - and some of these can be quite long (eg 32Kb). The SQL is as follows: explain analyse select distinct T1.* from role T1 where T1.endDate is null and T1.latest=true and T1.active=true and T1.deceased=false and T1.desk in (BIG LIST OF INTEGERS); select version() --> "PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-10), 64-bit" show enable_hashagg --> "on" set work_mem='8MB' show work_mem --> "8MB" Explain analyse of the SQL above: Unique (cost=47033.71..48410.27 rows=8881 width=1057) (actual time=18.803..38.969 rows=6449 loops=1) -> Sort (cost=47033.71..47055.91 rows=8881 width=1057) (actual time=18.801..20.560 rows=6449 loops=1) Sort Key: id, version, latest, active, deceased, person, formalnotes, informalnotes, description, desk, rolelevel, roletype, promotiondate, primaryrole, headofplace, careergrading, startdate, enddate, percentsalary, deskf, rolelevelf, roletypef, promotiondatef, primaryrolef, headofplacef, careergradingf, startdatef, enddatef, percentsalaryf, descriptionf, deskmv, rolelevelmv, roletypemv, promotiondatemv, primaryrolemv, headofplacemv, careergradingmv, startdatemv, enddatemv, percentsalarymv, descriptionmv, hasattachments, hasrelationships, hasprojects, audwho, audwhen, audcreated, costcentre, reportsto, manages, startdateest, enddateest, hasstarperformers, projectnames, sourcefrom, sourceto, checkedwho, checkedwhen, checkednotes, hasqueries, querytitles Sort Method: quicksort Memory: 2001kB -> Bitmap Heap Scan on role t1 (cost=4888.59..42321.27 rows=8881 width=1057) (actual time=7.041..12.504 rows=6449 loops=1) Recheck Cond: (desk = ANY ('BIG LIST OF INTEGERS'::bigint[])) Filter: ((enddate IS NULL) AND latest AND active AND (NOT deceased)) -> Bitmap Index Scan on role_ix2 (cost=0.00..4886.37 rows=10984 width=0) (actual time=6.948..6.948 rows=9296 loops=1) Index Cond: ((latest = true) AND (active = true) AND (deceased = false) AND (desk = ANY ('BIG LIST OF INTEGERS'::bigint[]))) Total runtime: 40.777 ms This execution of the query used a sort to perform the "distinct". Now for the second run: select version() --> "PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-10), 64-bit" show enable_hashagg --> "on" set work_mem='64MB' show work_mem --> "64MB" Explain analyse of the SQL above: HashAggregate (cost=43675.63..43764.44 rows=8881 width=1057) (actual time=46.556..55.694 rows=6449 loops=1) -> Bitmap Heap Scan on role t1 (cost=4888.59..42321.27 rows=8881 width=1057) (actual time=7.179..13.023 rows=6449 loops=1) Recheck Cond: (desk = ANY ('BIG LIST OF INTEGERS'::bigint[])) Filter: ((enddate IS NULL) AND latest AND active AND (NOT deceased)) -> Bitmap Index Scan on role_ix2 (cost=0.00..4886.37 rows=10984 width=0) (actual time=7.086..7.086 rows=9296 loops=1) Index Cond: ((latest = true) AND (active = true) AND (deceased = false) AND (desk = ANY ('BIG LIST OF INTEGERS'::bigint[]))) Total runtime: 57.536 ms I've tested this with v8.4.4 as well with the same results. I also tested the same query with our previous production version of Postgres (v8.3.8) and that version only appears to use sorting not hash-aggregates. Obviously, I can re-write the query to use a "distinct on (...)" clause to improve performance - which is what I've done, but my question is: Why is the hash-aggregate slower than the sort? Is it something to do with the number of columns? ie. When sorting, the first few columns defined on the table (id, version) make the row unique - but when using the hash-aggregate feature, presumably every column needs to be hashed which takes longer especially for long text fields? Thanks, --Jatinder -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance