Re: PostgreSQL runs a query much slower than BDE and MySQL

Scott Lamb <slamb@xxxxxxxxx> · Thu, 17 Aug 2006 15:00:52 -0700

Peter, I compared these using the data you supplied on my PostgreSQL  
8.1.4 system:

On Aug 17, 2006, at 12:09 PM, Scott Lamb wrote:

On Aug 16, 2006, at 3:51 PM, Tom Lane wrote:
/* Select all sheep who's most recent transfer was into the  
subject flock */
SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in
FROM SHEEP_FLOCK f1 JOIN
    /* The last transfer date for each sheep */
    (SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date
    FROM  SHEEP_FLOCK f
    GROUP BY f.regn_no) f2
ON f1.regn_no = f2.regn_no
WHERE f1.flock_no = '1359'
AND f1.transfer_date = f2.last_xfer_date

        QUERY PLAN
------------------------------------------------------------------------ 
------------------------------------------------------------------------ 
------------------------
Unique  (cost=2575.07..2575.08 rows=1 width=36) (actual  
time=1083.579..1083.696 rows=32 loops=1)
   ->  Sort  (cost=2575.07..2575.07 rows=1 width=36) (actual  
time=1083.576..1083.613 rows=32 loops=1)
         Sort Key: f1.regn_no, f1.transfer_date
         ->  Nested Loop  (cost=1364.00..2575.06 rows=1 width=36)  
(actual time=287.895..1083.297 rows=32 loops=1)
               ->  HashAggregate  (cost=1364.00..1366.50 rows=200  
width=36) (actual time=262.345..337.940 rows=38815 loops=1)
                     ->  Seq Scan on sheep_flock f   
(cost=0.00..1116.00 rows=49600 width=36) (actual time=0.005..119.282  
rows=81802 loops=1)
               ->  Index Scan using sheep_flock_pkey on sheep_flock  
f1  (cost=0.00..6.02 rows=1 width=36) (actual time=0.016..0.016  
rows=0 loops=38815)
                     Index Cond: (((f1.regn_no)::text =  
("outer".regn_no)::text) AND ((f1.flock_no)::text = '1359'::text) AND  
(f1.transfer_date = "outer"."?column2?"))
Total runtime: 1085.115 ms
(9 rows)

This seems pretty closely related to this recent thread:
http://archives.postgresql.org/pgsql-performance/2006-08/msg00220.php
in which the OP is doing a very similar kind of query in almost  
exactly
the same way.

I can't help thinking that there's probably a better way to phrase  
this
type of query in SQL, though it's not jumping out at me what that is.

I don't know about better, but I tend to phrase these in a quite  
different way that's (hopefully) equivalent:

select    latest.regn_no,
          latest.transfer_date as date_in
from      sheep_flock latest
where     not exists (
          select    'x'
          from      sheep_flock even_later
          where     latest.regn_no = even_later.regn_no
            and     latest.transfer_date < even_later.transfer_date)
  and     latest.flock_no = '1359'

There's no MAX() or DISTINCT here, so maybe this is easier to  
optimize?

                                                                      Q 
UERY PLAN
------------------------------------------------------------------------ 
------------------------------------------------------------------------ 
------
Bitmap Heap Scan on sheep_flock latest  (cost=764.60..2185.05  
rows=124 width=36) (actual time=11.915..13.800 rows=32 loops=1)
   Recheck Cond: ((flock_no)::text = '1359'::text)
   Filter: (NOT (subplan))
   ->  Bitmap Index Scan on sheep_flock_pkey  (cost=0.00..764.60  
rows=248 width=0) (actual time=10.950..10.950 rows=127 loops=1)
         Index Cond: ((flock_no)::text = '1359'::text)
   SubPlan
     ->  Index Scan using sheep_flock_pkey on sheep_flock  
even_later  (cost=0.00..317.49 rows=83 width=0) (actual  
time=0.016..0.016 rows=1 loops=127)
           Index Cond: ((($0)::text = (regn_no)::text) AND ($1 <  
transfer_date))
Total runtime: 13.902 ms
(9 rows)

seems to return the same data in two orders of magnitude less time.

--
Scott Lamb <http://www.slamb.org/>