Search Postgresql Archives

hash join performance question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

testdb3=# \d csischema.dim_company;
               Table "csischema.dim_company"
     Column      |            Type             | Modifiers 
-----------------+-----------------------------+-----------
 company_id      | integer                     | not null
 company_name    | character varying(100)      | 
 city            | character varying(100)      | 
 state           | character varying(100)      | 
 postal_code     | character varying(100)      | 
 country         | character varying(100)      | 
 latitude        | double precision            | 
 longitude       | double precision            | 
 update_datetime | timestamp without time zone | 
 company_source  | character varying(1)        | 
Indexes:
    "dim_company_pkey" PRIMARY KEY, btree (company_id)

testdb3=# \d woc.dim_company;
                  Table "woc.dim_company"
     Column      |            Type             | Modifiers 
-----------------+-----------------------------+-----------
 company_id      | integer                     | not null
 company_name    | character varying(100)      | 
 city            | character varying(100)      | 
 state           | character varying(100)      | 
 postal_code     | character varying(100)      | 
 country         | character varying(100)      | 
 latitude        | double precision            | 
 longitude       | double precision            | 
 update_datetime | timestamp without time zone | 
 company_source  | character varying(1)        | 
Indexes:
    "dim_company_pkey" PRIMARY KEY, btree (company_id)

testdb3=# select count(*) from csischema.dim_company;
  count  
---------
 1786376
(1 row)

testdb3=# select count(*) from woc.dim_company;
 count 
-------
 18980
(1 row)

woc.dim_company is a subset of csischema.dim_company meaning all company_id from woc.dim_company  are in csischema.dim_company
Ratio is around 1%



        
SELECT a.company_id  FROM csischema.dim_company a, woc.dim_company b
WHERE a.company_id = b.company_id;


testdb3=# explain analyze SELECT a.company_id  FROM csischema.dim_company a, woc.dim_company b
testdb3-# WHERE a.company_id = b.company_id;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=711.05..54938.35 rows=18980 width=4) (actual time=34.067..1118.603 rows=18980 loops=1)
   Hash Cond: (a.company_id = b.company_id)
   ->  Seq Scan on dim_company a  (cost=0.00..47097.82 rows=1850582 width=4) (actual time=0.013..523.249 rows=1786376 loops=1)
   ->  Hash  (cost=473.80..473.80 rows=18980 width=4) (actual time=20.203..20.203 rows=18980 loops=1)
         Buckets: 32768  Batches: 1  Memory Usage: 924kB
         ->  Seq Scan on dim_company b  (cost=0.00..473.80 rows=18980 width=4) (actual time=0.007..10.076 rows=18980 loops=1)
 Planning time: 0.511 ms
 Execution time: 1121.068 ms
(8 rows)


I was expecting at least the PK of csischema.dim_company to be used . In another DBMS that was the case. The larger table , csischema.dim_company used the PK.


Any hints, thoughts what am I not seing ?


Thank you
Armand



-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux