Hi all -- In attempting to perform a particular query in postgres, I ran into what appeared to be a pretty severe performance bottleneck. I didn't know whether I'd constructed my query stupidly, or I'd misconfigured postgres in some suboptimal way, or what else might be going on. Though I very much wanted to know the underlying cause of this performance issue on my query, I was also facing the constraint that I was working with a system in production and the tables my query operated on could not be taken offline. So I did the sensible thing and captured a snapshot of the current production system via simple pg_dump and played it back into a different postgres installation. The snapshot copy was installed onto a "development system" with a Pentium D class processor running postgres 8.1 on CentOS 4 32-bit whereas the production copy was on a "production system" with a Core 2 Duo class processor running postgres 8.3 on OpenSUSE 10.2 64-bit. I resubmitted a modified version of the query that'd given me problems before, this time giving the same query to both systems at the same time. To my surprise, the query finished successfully on the development system in under a minute. I let the query continue to run on the production system while I verified the results from the query on the development system -- everything checked out. It took more than a day and a half for the query to complete on the production system. Granted, the production system was continuing to see inserts to the relevant table while my query was running and granted that the size of the production system's table had grown since the snapshot (but not more than 50% larger), but I found it very difficult to accept these conditions explaining such a massive difference in performance of my query. Here's the query and the response to console, including \timing info, as submitted on the development system: smirkfp=# insert into dupids ( id ) select id from content where id not in (select min(id) from content group by hash); INSERT 0 437391 Time: 25733.394 ms Here is the same query with the very different timing results seen on the production system: smirkfp=# insert into dupids ( id ) select id from content where id not in (select min(id) from content group by hash); INSERT 0 441592 Time: 142622702.430 ms A little more background: The table of interest, content, has around 1.5M rows on the production system and around 1.1M rows on the development system at the time this query was run. On both systems, the smirkfp databases are centered around this table, content, and have no other large tables or activities of interest outside this table. The database is sufficiently new that no time had been taken to vacuum or analyze anything previously. Neither the development nor production system had noteworthy processor load or disk activity outside postgres. On the production system, the above long-running query was pegging one of the processor cores at ~100% for almost the whole time of the query's execution. On the development system, I asked postgres to show me the execution plan for my query: smirkfp=# explain insert into dupids ( id ) select id from content where id not in (select min(id) from content group by hash); QUERY PLAN --------------------------------------------------------------------------------- Seq Scan on content (cost=204439.55..406047.33 rows=565752 width=4) Filter: (NOT (hashed subplan)) SubPlan -> HashAggregate (cost=204436.55..204439.05 rows=200 width=36) -> Seq Scan on content (cost=0.00..198779.03 rows=1131503 width=36) (5 rows) For comparison, I asked the same thing of the production system: smirkfp=# explain insert into dupids ( id ) select id from content where id not in (select min(id) from content group by hash); QUERY PLAN --------------------------------------------------------------------------------------------- Seq Scan on content (cost=493401.85..9980416861.63 rows=760071 width=4) Filter: (NOT (subplan)) SubPlan -> Materialize (cost=493401.85..504915.85 rows=646400 width=37) -> GroupAggregate (cost=468224.39..487705.45 rows=646400 width=37) -> Sort (cost=468224.39..472024.74 rows=1520142 width=37) Sort Key: public.content.hash -> Seq Scan on content (cost=0.00..187429.42 rows=1520142 width=37) (8 rows) Looks pretty different. Next, I thought I'd try asking the development system to analyze the table, content, and see if that changed anything: smirkfp=# analyze content; ANALYZE smirkfp=# explain insert into dupids ( id ) select id from content where id not in (select min(id) from content group by hash); QUERY PLAN --------------------------------------------------------------------------------------------- Seq Scan on content (cost=480291.35..7955136280.55 rows=582888 width=4) Filter: (NOT (subplan)) SubPlan -> Materialize (cost=480291.35..492297.85 rows=656050 width=40) -> GroupAggregate (cost=457245.36..474189.30 rows=656050 width=40) -> Sort (cost=457245.36..460159.80 rows=1165776 width=40) Sort Key: hash -> Seq Scan on content (cost=0.00..199121.76 rows=1165776 width=40) (8 rows) That looks a bit discouraging. As a consequence of the analyze, it seems the development system has changed its execution plan for that query to be just like the production system. Let's drop the table, dupids, and re-create it, then re-run our query on the development system with its new execution plan: smirkfp=# drop table dupids; DROP TABLE Time: 227.843 ms smirkfp=# create table dupids ( id integer ); CREATE TABLE Time: 30.980 ms smirkfp=# insert into dupids ( id ) select id from content where id not in (select min(id) from content group by hash); Cancel request sent ERROR: canceling statement due to user request smirkfp=# I grew impatient after letting the query run for an hour. Without knowing precisely how long the query would've taken with this execution plan, it seems clear enough that it is a suboptimal plan. How to approach manipulating the execution plan back to something more efficient? What characteristics of the table could have induced analyze to suggest the much slower query plan? Thanks in advance, Davin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance