Re: Oracle v. Postgres 9.0 query performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



08.06.11 18:40, Tony Capobianco ÐÐÐÐÑÐÐ(ÐÐ):
pg_dw=# set enable_nestloop =0;
SET
Time: 0.165 ms
pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;
                                        QUERY
PLAN
-----------------------------------------------------------------------------------------
  HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
    ->   Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
          Hash Cond: (o.emailcampaignid = s.emailcampaignid)
          ->   Seq Scan on openactivity o  (cost=0.00..3529930.67
rows=192540967 width=12)
          ->   Hash  (cost=8.79..8.79 rows=479 width=4)
                ->   Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
width=4)

Yikes.  Two sequential scans.

Yep. Can you see another options? Either you take each of 479 records and try to find matching records in another table using index (first plan), or you take both two tables fully (seq scan) and join - second plan. First plan is better if your large table is clustered enough on emailcampaignid field (479 index reads and 479 sequential table reads). If it's not, you may get a 479 table reads transformed into a lot or random reads. BTW: May be you have different data clustering in PostgreSQL & Oracle? Or data in Oracle may be "hot" in caches? Also, sequential scan is not too bad thing. It may be cheap enough to read millions of records if they are not too wide. Please show "select pg_size_pretty(pg_relation_size('openactivity'));" Have you tried to explain analyze second plan?

Best regards, Vitalii Tymchyshyn



On Wed, 2011-06-08 at 11:33 -0400, Tom Lane wrote:
Tony Capobianco<tcapobianco@xxxxxxxxxxxxxx>  writes:
pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;
                                                  QUERY
PLAN
-------------------------------------------------------------------------------------------------------------
  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
    ->   Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
          ->   Index Scan using ecr_sents_ecid_idx on ecr_sents s
(cost=0.00..38.59 rows=479 width=4)
          ->   Index Scan using openact_emcamp_idx on openactivity o
(cost=0.00..3395.49 rows=19372 width=12)
                Index Cond: (o.emailcampaignid = s.emailcampaignid)
(5 rows)
Should this query be hashing the smaller table on Postgres rather than
using nested loops?
Yeah, seems like it.  Just for testing purposes, do "set enable_nestloop
= 0" and see what plan you get then.


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux