Just a note about column equivalence disarming the planner

Shaun Thomas <sthomas@xxxxxxxxx> · Fri, 8 Jul 2011 15:20:31 -0500

Hello folks,

This isn't really a problem, so much as an observation of just how much 
the internals have changed over the years. We've got an older version 
we're hoping to upgrade soon, and a developer asked me to optimize this 
today:

SELECT order_id
  FROM order
 WHERE order_id = root_order_id
   AND order_id IN (
         SELECT DISTINCT m.root_order_id
           FROM wacky_orders c
           JOIN order m USING (root_order_id)
          WHERE m.order_type = 'regular'
          GROUP BY m.root_order_id, m.route_id
         HAVING COUNT(1) > 1
       );

From what I could tell, the query was fine. But this part of the 
explain was confusing the hell out of me:

->  Seq Scan on order  (cost=0.00..218943.98 rows=24092 width=20)
     Filter: (order_id = root_order_id)

The thing is, that subquery there only produced 150 rows. So I shrugged 
and simplified further by making a temp table, and got this:

SELECT order_id
  FROM order m
  JOIN zany_orders z ON (m.order_id = z.root_order_id)
 WHERE m.order_id = m.root_order_id;

Which produced this:

Merge Join  (cost=220705.29..220826.19 rows=1 width=10)
 Merge Cond: (m.order_id = z.root_order_id)
 ->  Sort  (cost=220697.42..220757.65 rows=24092 width=20)
       Sort Key: m.order_id
       ->  Seq Scan on order m  (cost=0.00..218943.98 rows=24092 width=20)
             Filter: (order_id = root_order_id)
 ->  Sort  (cost=7.87..8.24 rows=149 width=11)
       Sort Key: z.root_order_id
       ->  Seq Scan on zany_orders z  (cost=0.00..2.49 rows=149 width=11)

Ok, now it's just screwing with me. The order table has about 5M rows, 
and this is clearly not a good idea, here. But then I took a closer 
look. Why did it decide to filter based on a condition 90% of the table 
fits, and then *merge* those results in with the 150-row temp table?

So, for giggles, I cast a column type to tell the planner it shouldn't 
consider the columns equivalent:

SELECT master_order_id
  FROM order m
  JOIN zany_orders z ON (m.order_id = z.root_order_id)
 WHERE m.order_id::VARCHAR = m.root_order_id;

And voila:

Nested Loop  (cost=0.00..839.82 rows=1 width=8)
  ->  Seq Scan on zany_orders z  (cost=0.00..2.49 rows=149 width=11)
  ->  Index Scan using order_pkey on order m  (cost=0.00..5.60 rows=1 
width=8)
        Index Cond: (m.order_id = z.root_order_id)
        Filter: ((order_id)::varchar = root_order_id)

I tried this with a mere 9.0 install and it wasn't having any of it. It 
completely ignored the red-herring WHERE clause except as a post-filter. 
I'm pretty sure that if it were possible to manifest itself to slap me 
for even trying, it would have done so.

I've noticed lots of little things like this recently, and I have to 
say, the planner has made huge improvements regardless of what 
perception may reflect sometimes. It still has some holes and room for 
improvement, but I just wanted to thank the devs for all their hard work.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@xxxxxxxxx

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance