Re: Incorrect index being used

Jesse Long <jpl@xxxxxxxxxxxxxx> · Wed, 09 Oct 2013 13:02:05 +0200

On 09/10/2013 12:57, Jesse Long wrote:
On 09/10/2013 12:10, Albe Laurenz wrote:
Jesse Long wrote:
I have the following query, run immediately after executing VACUUM in
the database. There is only one connection to the database.
You should run ANALYZE, not VACUUM.

The query runs for much longer than I expect it to run for, and I think
this is due to it using the incorrect subplan. As you can see, subplans
1 and 3 make use of and index, but these subplans are not used.
Subplans  and 4 are seqscan, and they are used.

How can I get PostgreSQL to use subplan 1 and 3?
They are only possible if an "Index Only Scan" is possible, which
can only be used if the respective table entries are visible for
all transactions.

testdb=> explain analyse SELECT * FROM ARCHIVE_DOCUMENT AS r0 WHERE 
r0.NODE_ID = 29 AND
r0.ARCHIVE_DATE >= '2013-07-08 18:28:00' AND (EXISTS (SELECT r1.* 
FROM ARCHIVE_DOCUMENT_INDEX AS r1
WHERE r1.ARCHIVE_ID = r0.ID AND r1.NODE_ID = r0.NODE_ID AND r1.VALUE 
= 'BSH70002152' ) OR EXISTS (
SELECT r2.* FROM ARCHIVE_DOCUMENT_INDEX AS r2 WHERE r2.ARCHIVE_ID = 
r0.ID AND r2.NODE_ID = r0.NODE_ID
AND r2.VALUE = 'TC212592' ) ) ORDER BY r0.ARCHIVE_DATE DESC LIMIT 10;
[...]

           Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR 
(alternatives: SubPlan 3 or hashed
SubPlan 4))

           Rows Removed by Filter: 710851

           SubPlan 1

             ->  Index Only Scan using 
archive_document_index_x_archive_id_node_id_value on
archive_document_index r1  (cost=0.57..4.59 rows=1 width=0) (never 
executed)

                   Index Cond: ((archive_id = r0.id) AND (node_id = 
r0.node_id) AND (value =
'BSH70002152'::text))

                   Heap Fetches: 0

           SubPlan 2

             ->  Seq Scan on archive_document_index r1_1 
(cost=0.00..1958104.00 rows=1520 width=16)
(actual time=44418.383..44558.293 rows=4 loops=1)

                   Filter: ((value)::text = 'BSH70002152'::text)

                   Rows Removed by Filter: 95009919

           SubPlan 3

             ->  Index Only Scan using 
archive_document_index_x_archive_id_node_id_value on
archive_document_index r2  (cost=0.57..4.59 rows=1 width=0) (never 
executed)

                   Index Cond: ((archive_id = r0.id) AND (node_id = 
r0.node_id) AND (value =
'TC212592'::text))

                   Heap Fetches: 0

           SubPlan 4

             ->  Seq Scan on archive_document_index r2_1 
(cost=0.00..1958104.00 rows=1520 width=16)
(actual time=41659.464..41663.342 rows=1 loops=1)

                   Filter: ((value)::text = 'TC212592'::text)

                   Rows Removed by Filter: 95009922
The estimates are quite off.
Does "ANALYZE archive_document", possibly after increasing
default_statistics_target, make a difference?

Yours,
Laurenz Albe

Hi Laurenz,

Thank you for the feedback.

There is no problem with row visibility, there is only one connection 
to the database - the connection I am using to do these selects.

Thanks you for the advise regarding ANALYZE. I ran ANALYZE on both 
tables concerned, but not much changed:

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 

 Limit  (cost=0.56..151.85 rows=10 width=122) (actual 
time=40841.984..85668.213 rows=2 loops=1)

   ->  Index Scan Backward using 
idx_archive_document_x_node_id_archive_date on archive_document r0  
(cost=0.56..7627640.20 rows=504186 width=122) (actual time=40841.98

         Index Cond: ((node_id = 29) AND (archive_date >= '2013-07-08 
18:28:00'::timestamp without time zone))

         Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR 
(alternatives: SubPlan 3 or hashed SubPlan 4))

         Rows Removed by Filter: 710851

         SubPlan 1

           ->  Index Only Scan using 
archive_document_index_x_archive_id_node_id_value on 
archive_document_index r1  (cost=0.57..4.59 rows=1 width=0) (never 
executed)

                 Index Cond: ((archive_id = r0.id) AND (node_id = 
r0.node_id) AND (value = 'BSH70002152'::text))

                 Heap Fetches: 0

         SubPlan 2

           ->  Seq Scan on archive_document_index r1_1 
(cost=0.00..1958101.80 rows=1568 width=16) (actual 
time=36633.365..40841.909 rows=4 loops=1)

                 Filter: ((value)::text = 'BSH70002152'::text)

                 Rows Removed by Filter: 95009919

         SubPlan 3

           ->  Index Only Scan using 
archive_document_index_x_archive_id_node_id_value on 
archive_document_index r2  (cost=0.57..4.59 rows=1 width=0) (never 
executed)

                 Index Cond: ((archive_id = r0.id) AND (node_id = 
r0.node_id) AND (value = 'TC212592'::text))

                 Heap Fetches: 0

         SubPlan 4

           ->  Seq Scan on archive_document_index r2_1 
(cost=0.00..1958101.80 rows=1568 width=16) (actual 
time=40241.599..44462.485 rows=1 loops=1)

                 Filter: ((value)::text = 'TC212592'::text)

                 Rows Removed by Filter: 95009922

 Total runtime: 85676.734 ms

(22 rows)

Sorry, I neglected to mention that this is on PostgreSQL 9.3.0.

Thanks,
Jesse

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general