On 6/3/09 8:42 AM, "Davin Potts" <davin@xxxxxxxxxxxxxx> wrote: > Hi all -- > > > A little more background: The table of interest, content, has around > 1.5M rows on the production system and around 1.1M rows on the > development system at the time this query was run. On both systems, > the smirkfp databases are centered around this table, content, and > have no other large tables or activities of interest outside this > table. The database is sufficiently new that no time had been taken > to vacuum or analyze anything previously. Neither the development nor > production system had noteworthy processor load or disk activity > outside postgres. On the production system, the above long-running > query was pegging one of the processor cores at ~100% for almost the > whole time of the query's execution. > > On the development system, I asked postgres to show me the execution > plan for my query: > smirkfp=# explain insert into dupids ( id ) select id from content > where id not in (select min(id) from content group by hash); > QUERY PLAN > ------------------------------------------------------------------------------ > --- > Seq Scan on content (cost=204439.55..406047.33 rows=565752 width=4) > Filter: (NOT (hashed subplan)) > SubPlan > -> HashAggregate (cost=204436.55..204439.05 rows=200 width=36) > -> Seq Scan on content (cost=0.00..198779.03 rows=1131503 > width=36) > (5 rows) > > > For comparison, I asked the same thing of the production system: > smirkfp=# explain insert into dupids ( id ) select id from content > where id not in (select min(id) from content group by hash); > QUERY PLAN > ------------------------------------------------------------------------------ > --------------- > Seq Scan on content (cost=493401.85..9980416861.63 rows=760071 width=4) > Filter: (NOT (subplan)) > SubPlan > -> Materialize (cost=493401.85..504915.85 rows=646400 width=37) > -> GroupAggregate (cost=468224.39..487705.45 rows=646400 width=37) > -> Sort (cost=468224.39..472024.74 rows=1520142 width=37) > Sort Key: public.content.hash > -> Seq Scan on content (cost=0.00..187429.42 > rows=1520142 width=37) > (8 rows) > > > Looks pretty different. Next, I thought I'd try asking the > development system to analyze the table, content, and see if that > changed anything: > smirkfp=# analyze content; > ANALYZE > smirkfp=# explain insert into dupids ( id ) select id from content > where id not in (select min(id) from content group by hash); > QUERY PLAN > ------------------------------------------------------------------------------ > --------------- > Seq Scan on content (cost=480291.35..7955136280.55 rows=582888 width=4) > Filter: (NOT (subplan)) > SubPlan > -> Materialize (cost=480291.35..492297.85 rows=656050 width=40) > -> GroupAggregate (cost=457245.36..474189.30 rows=656050 width=40) > -> Sort (cost=457245.36..460159.80 rows=1165776 width=40) > Sort Key: hash > -> Seq Scan on content (cost=0.00..199121.76 > rows=1165776 width=40) > (8 rows) > > How to approach manipulating the execution plan back to something more > efficient? What characteristics of the table could have induced > analyze to suggest the much slower query plan? > When the table was analyzed, it found many more rows for the hash than the default assumption (distinct value estimate). If Postgres thinks the hash-aggregate plan won't fit in work_mem, it will go to a sort -> group aggregate plan even if it estimates the sort plan to take thousands of times more effort. Solutions: 1. If this estimate is wrong, try increasing the statistics target on the column(s) in question and re-analyzing. Or for a test change the global default_statistics_target and experiment. In the explain queries below, it appears as though this approach may have only a moderate affect. Is the estimate of 650,000 distinct values for the hash column accurate? 2. Increase work_mem so that it can fit the hash in memory and use that plan and not have to sort the whole table. The below explain on your production db thinks it needs to hash into about ~650000 distinct buckets rows of width 37. That should fit in 32MB RAM or so. Try work_mem of 16MB, 32MB, and 64MB (and perhaps even 128M or larger on the test box) and see if the explain changes. set work_mem ='32M'; Explain <your query>; To see what your current work_mem is do show work_mem; If this approach works, you can either set this work_mem before running the query, or globally change it. It is set low by default because if all your connections are busy doing work that requires work_mem, you can end up using RAM at up to about (2 * work_mem * active connections). On my larger db, the back-end aggregate processing connections use work_mem = 750M, which allows queries with 150M + rows to aggregate in minutes rather than sort for half a day. However, we carefully cap the number of processes that can concurrently run such queries. > > Thanks in advance, > > Davin > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance