Re: Query planner question

Soni M <diptatapa@xxxxxxxxx> · Fri, 22 Aug 2014 19:26:21 +0700

On Thu, Aug 21, 2014 at 9:26 AM, David G Johnston <david.g.johnston@xxxxxxxxx> wrote:

Soni M wrote

> Hi Everyone,

>

> I have this query :

>

> select t.ticket_id ,

> tb.transmission_id

> from ticket t,

> transmission_base tb

> where t.latest_transmission_id = tb.transmission_id

> and t.ticket_number = tb.ticket_number

> and tb.parse_date > ('2014-07-31');

>

> Execution plan: http://explain.depesz.com/s/YAak

>

> Indexes on ticket :

>     "ticket_pkey" PRIMARY KEY, btree (ticket_id) CLUSTER

>     "ticket_by_latest_transmission" btree (latest_transmission_id)

>     "ticket_by_ticket_number" btree (ticket_number)

>

> This query only returns some portions of rows from ticket table.

> The question is, Why does postgres need to get all the rows from ticket

> table in order to complete this query?

> Can't postgres use indexes to get only needed rows on ticket table?

>

> I try set seqscan to off, but still index scan try to get all rows on

> ticket table.

> Here's the execution plan : http://explain.depesz.com/s/abH2

Short answer: you haven't defined "(latest_transmission_id, ticket_number)"

as being a foreign key onto the transmission_base table yet you seem to want

it to act like one.

Currently we have only latest_transmission_id as FK, described here :
TABLE "ticket" CONSTRAINT "fkcbe86b0c6ddac9e" FOREIGN KEY (latest_transmission_id) REFERENCES transmission_base(transmission_id)

Change the query to include only FK still result the same:
explain select t.ticket_id ,
tb.transmission_id
from ticket t,
transmission_base tb

where t.latest_transmission_id = tb.transmission_id
and tb.parse_date > ('2014-07-31');
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=113928.06..2583606.96 rows=200338 width=8)
   Hash Cond: (t.latest_transmission_id = tb.transmission_id)
   ->  Seq Scan on ticket t  (cost=0.00..1767767.26 rows=69990826 width=8)
   ->  Hash  (cost=108923.38..108923.38 rows=400374 width=4)
         ->  Index Scan using transmission_base_by_parse_date on transmission_base tb  (cost=0.00..108923.38 rows=400374 width=4)

               Index Cond: (parse_date > '2014-07-31 00:00:00'::timestamp without time zone)
(6 rows)

I've googling this one, it seems that's how hash join works. For hash join operation, the join predicate cannot be used for the index scan, only independent predicate can be used in index scan. http://use-the-index-luke.com/sql/join/hash-join-partial-objects

Because of this failure the planner considers the following:

Nested Looping over 380,000 records is going to suck so it tries some

advanced "merge/join" techniques to try and speed things up.  In any such

alternative the entire ticket table needs to be considered since there is no

constraint provided for that table - the only constraint in on

transmission_base and it rightly is using an index to find records matching

the where clause.

Since ticket_number and latest_transmission_id are found in separate indexes

I do not believe the planner can make use of an Index Only scan to fulfill

the join so each index lookup would require a corresponding heap lookup

which means extra work compared to just sequentially scanning the heap in

the first place.  Since it is going to hit the entire thing in either case

the sequential scan is the logical choice for it to make.

Others will correct any factual mistakes I may have made - I am theorizing

here and do not understand the planner sufficient well to be 100% certain

that an FK definition will solve the problem.

David J.

--

View this message in context: http://postgresql.1045698.n5.nabble.com/Query-planner-question-tp5815659p5815661.html

Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--

Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-general

-- 
Regards,

Soni Maula Harriz