Re: Oracle v. Postgres 9.0 query performance

Vitalii Tymchyshyn <tivv00@xxxxxxxxx> · Wed, 08 Jun 2011 18:52:05 +0300

08.06.11 18:40, Tony Capobianco ÐÐÐÐÑÐÐ(ÐÐ):
pg_dw=# set enable_nestloop =0;
SET
Time: 0.165 ms
pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;
                                        QUERY
PLAN
-----------------------------------------------------------------------------------------
  HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
    ->   Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
          Hash Cond: (o.emailcampaignid = s.emailcampaignid)
          ->   Seq Scan on openactivity o  (cost=0.00..3529930.67
rows=192540967 width=12)
          ->   Hash  (cost=8.79..8.79 rows=479 width=4)
                ->   Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
width=4)

Yikes.  Two sequential scans.

Yep. Can you see another options? Either you take each of 479 records 
and try to find matching records in another table using index (first 
plan), or you take both two tables fully (seq scan) and join - second plan.
First plan is better if your large table is clustered enough on 
emailcampaignid field (479 index reads and 479 sequential table reads). 
If it's not, you may get a 479 table reads transformed into a lot or 
random reads.
BTW: May be you have different data clustering in PostgreSQL & Oracle? 
Or data in Oracle may be "hot" in caches?
Also, sequential scan is not too bad thing. It may be cheap enough to 
read millions of records if they are not too wide. Please show "select 
pg_size_pretty(pg_relation_size('openactivity'));" Have you tried to 
explain analyze second plan?

Best regards, Vitalii Tymchyshyn

On Wed, 2011-06-08 at 11:33 -0400, Tom Lane wrote:
Tony Capobianco<tcapobianco@xxxxxxxxxxxxxx>  writes:
pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;
                                                  QUERY
PLAN
-------------------------------------------------------------------------------------------------------------
  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
    ->   Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
          ->   Index Scan using ecr_sents_ecid_idx on ecr_sents s
(cost=0.00..38.59 rows=479 width=4)
          ->   Index Scan using openact_emcamp_idx on openactivity o
(cost=0.00..3395.49 rows=19372 width=12)
                Index Cond: (o.emailcampaignid = s.emailcampaignid)
(5 rows)
Should this query be hashing the smaller table on Postgres rather than
using nested loops?
Yeah, seems like it.  Just for testing purposes, do "set enable_nestloop
= 0" and see what plan you get then.

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance