Just a note about column equivalence disarming the planner

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello folks,

This isn't really a problem, so much as an observation of just how much the internals have changed over the years. We've got an older version we're hoping to upgrade soon, and a developer asked me to optimize this today:

SELECT order_id
  FROM order
 WHERE order_id = root_order_id
   AND order_id IN (
         SELECT DISTINCT m.root_order_id
           FROM wacky_orders c
           JOIN order m USING (root_order_id)
          WHERE m.order_type = 'regular'
          GROUP BY m.root_order_id, m.route_id
         HAVING COUNT(1) > 1
       );

From what I could tell, the query was fine. But this part of the explain was confusing the hell out of me:

->  Seq Scan on order  (cost=0.00..218943.98 rows=24092 width=20)
     Filter: (order_id = root_order_id)

The thing is, that subquery there only produced 150 rows. So I shrugged and simplified further by making a temp table, and got this:

SELECT order_id
  FROM order m
  JOIN zany_orders z ON (m.order_id = z.root_order_id)
 WHERE m.order_id = m.root_order_id;

Which produced this:

Merge Join  (cost=220705.29..220826.19 rows=1 width=10)
 Merge Cond: (m.order_id = z.root_order_id)
 ->  Sort  (cost=220697.42..220757.65 rows=24092 width=20)
       Sort Key: m.order_id
       ->  Seq Scan on order m  (cost=0.00..218943.98 rows=24092 width=20)
             Filter: (order_id = root_order_id)
 ->  Sort  (cost=7.87..8.24 rows=149 width=11)
       Sort Key: z.root_order_id
       ->  Seq Scan on zany_orders z  (cost=0.00..2.49 rows=149 width=11)

Ok, now it's just screwing with me. The order table has about 5M rows, and this is clearly not a good idea, here. But then I took a closer look. Why did it decide to filter based on a condition 90% of the table fits, and then *merge* those results in with the 150-row temp table?

So, for giggles, I cast a column type to tell the planner it shouldn't consider the columns equivalent:

SELECT master_order_id
  FROM order m
  JOIN zany_orders z ON (m.order_id = z.root_order_id)
 WHERE m.order_id::VARCHAR = m.root_order_id;

And voila:

Nested Loop  (cost=0.00..839.82 rows=1 width=8)
  ->  Seq Scan on zany_orders z  (cost=0.00..2.49 rows=149 width=11)
-> Index Scan using order_pkey on order m (cost=0.00..5.60 rows=1 width=8)
        Index Cond: (m.order_id = z.root_order_id)
        Filter: ((order_id)::varchar = root_order_id)

I tried this with a mere 9.0 install and it wasn't having any of it. It completely ignored the red-herring WHERE clause except as a post-filter. I'm pretty sure that if it were possible to manifest itself to slap me for even trying, it would have done so.

I've noticed lots of little things like this recently, and I have to say, the planner has made huge improvements regardless of what perception may reflect sometimes. It still has some holes and room for improvement, but I just wanted to thank the devs for all their hard work.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@xxxxxxxxx

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux