Re: Query optimization

David Johnston <david.g.johnston@xxxxxxxxx> · Wed, 29 Oct 2014 12:05:41 -0700

On Wed, Oct 29, 2014 at 11:53 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Jorge Arevalo <jorgearevalo@xxxxxxxxxxxx> writes:

> This is the result of EXPLAIN ANALYZE

>                                                                    QUERY

> PLAN

> -------------------------------------------------------------------------------------------------------------------------------------------------

>  Index Scan using table1_pkey on table1  (cost=67846.38..395773.45

> rows=8419127 width=88) (actual time=7122.704..22670.680 rows=8419127

> loops=1)

>    InitPlan 2 (returns $1)

>      ->  Result  (cost=67846.29..67846.29 rows=1 width=0) (actual

> time=7009.063..7009.065 rows=1 loops=1)

>            InitPlan 1 (returns $0)

>              ->  Seq Scan on table2 p  (cost=0.00..67846.29 rows=12689

> width=20) (actual time=14.971..5069.840 rows=2537787 loops=1)

>                    Filter: (f3 = field7)

Hm.  If I'm reading that right, you're building an array containing

2537787 entries, each of which is a composite datum containing two

columns of unmentioned datatypes.  I suspect a big chunk of your

runtime is going into manipulating that array -- PG is not terribly

efficient with big arrays containing variable-width values.

I'm also a bit confused as to why the planner is saying that the (SELECT

ARRAY(...)) bit is an InitPlan and not a SubPlan.  That implies that

"field7" in the innermost WHERE clause is not a reference to table1 but a

reference to table2.  Is that really what you meant?  IOW, are you sure

this query is performing the right calculation in the first place?

I thought the InitPlan was in place because the planner choose to execute the correlated subquery as a standalone query since it realizes that it is going to have to end up processing the entire table anyway due to the lack of a filter on the outer query.  In effect executing "table1 JOIN (table2 subquery) ON (f3 = field7)".

David J.