8.4.7, incorrect estimate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Howdy. We've got a query that takes less than a second unless we add a "order by" to it, after which it takes 40 seconds. Here's the query:

select page_number, ps_id, ps_page_id from ps_page where ps_page_id in (select ps_page_id from documents_ps_page where document_id in (select document_id from temp_doc_ids)) order by ps_page_id;

The parts of the schema used in this query:

                                Table "public.ps_page"
Column | Type | Modifiers
-------------+---------+--------------------------------------------------------------
ps_page_id | integer | not null default nextval('ps_page_ps_page_id_seq'::regclass)
 ps_id       | integer | not null
 page_number | integer | not null
Indexes:
    "ps_page_pkey" PRIMARY KEY, btree (ps_page_id)
    "ps_page_ps_id_key" UNIQUE, btree (ps_id, page_number)

 Table "public.documents_ps_page"
   Column    |  Type   | Modifiers
-------------+---------+-----------
 document_id | text    | not null
 ps_page_id  | integer | not null
Indexes:
    "documents_ps_page_pkey" PRIMARY KEY, btree (document_id, ps_page_id)
    "documents_ps_page_ps_page_id_idx" btree (ps_page_id)

temp_doc_ids (temporary table):
  document_id text not null

The query with the "order by" (slow):

explain analyze select page_number, ps_id, ps_page_id from ps_page where ps_page_id in (select ps_page_id from documents_ps_page where document_id in (select document_id from temp_document_ids)) order by ps_page_id Merge Semi Join (cost=212570.02..3164648.31 rows=34398932 width=12) (actual time=54749.281..54749.295 rows=5 loops=1)
      Merge Cond: (ps_page.ps_page_id = documents_ps_page.ps_page_id)
-> Index Scan using ps_page_pkey on ps_page (cost=0.00..2999686.03 rows=86083592 width=12) (actual time=0.029..36659.393 rows=85591467 loops=1) -> Sort (cost=18139.39..18152.52 rows=6255 width=4) (actual time=0.080..0.083 rows=5 loops=1)
            Sort Key: documents_ps_page.ps_page_id
            Sort Method:  quicksort  Memory: 25kB
-> Nested Loop (cost=26.23..17808.09 rows=6255 width=4) (actual time=0.044..0.073 rows=5 loops=1) -> HashAggregate (cost=26.23..27.83 rows=200 width=32) (actual time=0.015..0.017 rows=5 loops=1) -> Seq Scan on temp_document_ids (cost=0.00..23.48 rows=1310 width=32) (actual time=0.004..0.007 rows=5 loops=1) -> Index Scan using documents_ps_page_pkey on documents_ps_page (cost=0.00..88.59 rows=31 width=42) (actual time=0.009..0.010 rows=1 loops=5) Index Cond: (documents_ps_page.document_id = (temp_document_ids.document_id)::text)
    Total runtime: 54753.028 ms

The query without the "order by" (fast):

production=> explain analyze select page_number, ps_id, ps_page_id from ps_page where ps_page_id in (select ps_page_id from documents_ps_page where document_id in (select document_id from temp_doc_ids));

   QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=17821.42..87598.71 rows=34398932 width=12) (actual time=0.099..0.136 rows=5 loops=1) -> HashAggregate (cost=17821.42..17871.46 rows=6255 width=4) (actual time=0.083..0.096 rows=5 loops=1) -> Nested Loop (cost=26.23..17808.28 rows=6255 width=4) (actual time=0.047..0.076 rows=5 loops=1) -> HashAggregate (cost=26.23..27.83 rows=200 width=32) (actual time=0.014..0.015 rows=5 loops=1) -> Seq Scan on temp_doc_ids (cost=0.00..23.48 rows=1310 width=32) (actual time=0.005..0.005 rows=5 loops=1) -> Index Scan using documents_ps_page_pkey on documents_ps_page (cost=0.00..88.59 rows=31 width=42) (actual time=0.010..0.010 rows=1 loops=5) Index Cond: (documents_ps_page.document_id = temp_doc_ids.document_id) -> Index Scan using ps_page_pkey on ps_page (cost=0.00..11.14 rows=1 width=12) (actual time=0.007..0.007 rows=1 loops=5)
         Index Cond: (ps_page.ps_page_id = documents_ps_page.ps_page_id)
 Total runtime: 0.213 ms
(10 rows)

We notice that in all cases, the plans contain some estimated row counts that differ quite a bit from the actual row counts. We tried increasing (from 100 to 1,000 and 10,000) the statistics targets for each of the indexed columns, one at a time, and analyzing the table/column with each change. This had no effect.

Postgres version 8.4.7 on AMD64, Debian Linux "wheezy" (aka "testing").

Where should we look next?

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux