Re: Incorrect index being used

Jesse Long <jpl@xxxxxxxxxxxxxx> · Fri, 11 Oct 2013 15:32:21 +0200

On 09/10/2013 18:06, Tom Lane wrote:
Jesse Long <jpl@xxxxxxxxxxxxxx> writes:
The query runs for much longer than I expect it to run for, and I think
this is due to it using the incorrect subplan. As you can see, subplans
1 and 3 make use of and index, but these subplans are not used.
Subplans  and 4 are seqscan, and they are used.
How can I get PostgreSQL to use subplan 1 and 3?
You can't, and you would not like the results if you did.

The construct that's being described (perhaps not very intelligibly)
by this EXPLAIN output is an alternative pair of subplans.  Actually
there are two such alternative pairs in this example.  The indexscan
variants are subplans that would be fast if executed only once or
twice.  The seqscan variants, if used, are used to load a hashtable
that is then probed for each row of the outer plan.  If there are a
lot of rows to be considered in the outer plan, then it's better to
pay the price of loading the hashtable, because each hashtable probe
will be a lot cheaper than doing a fresh indexscan with the comparison
value from the current outer row.

In this example, we can see that the outer scan that the subplans
are attached to eliminated 710851 rows by means of the subplan filters,
meaning that the subplans were probed 710851+2 times.  If each of those
probes had been done with a separate indexscan, you'd likely still be
waiting for the result.  Using the seqscan+hashtable was definitely the
right choice here.

BTW, the reason it looks like this rather than just hard-wiring the
seqscan choice is a planner implementation artifact --- at the time
that the subplan plans are created, we don't know how many rows are
expected to pass through the outer plan level.  So we plan it both
ways and leave the choice to be made during executor startup.

What I'd suggest is that you see if you can't get rid of the "EXISTS() OR
EXISTS()" construction in favor of a single EXISTS clause --- I'm too lazy
to work out the details but it looks like you could do the OR in the WHERE
clause of a single EXISTS sub-select.  That would allow the planner to
convert the EXISTS into a semi-join, which might work better than what
you've got.  As is, you're dealing with fairly generic sub-select logic
that isn't going to be terribly well optimized.

Hi Tom,

I understand what you say about using the index (archive_id, node_id, 
value) to do a separate lookup for each row in the archive_document 
table that would be filtered. I understand that this would be constly.

However, the seqscan is killing me. I have another index on 
archive_document_index which has been there all along - (node_id, 
value). Would it not be better for PostgreSQL to use this index to 
perform an index scan instead of a seqscan when populating the hash table?

explain select * from archive_document_index where node_id = 29 and 
value = 'BSH70002152';
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Index Scan using archive_document_index_node_id_value_idx on 
archive_document_index  (cost=0.57..36.13 rows=14 width=33)
   Index Cond: ((node_id = 29) AND ((value)::text = 'BSH70002152'::text))
(2 rows)

Thanks,
Jesse

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general