New to Postgres and am prototyping a migration from Oracle
to Postgres 9.0.1 on Linux. Starting with the data warehouse. Current process
is to load the data from our OLTP (Oracle), dump it into another instance of Oracle
for staging and manipulation, then extract it and load it into Infobright. I am
trying to replace the Oracle instance used for staging and
manipulation with Postgres. Kettle (PDI), a Java ETL tool, is used for this
process. Came across a problem I find perplexing. I recreated the
dimensional tables in Oracle and the fields that are integers in Oracle became
integers in Postgres. Was experiencing terrible performance during
the load and narrowed down to a particular dimensional lookup problem. The table dim_carrier holds about 80k rows. You can see the actual
query issued by Kettle below, but basically I am looking up using the business
key from our OLTP system. This field is carrier_source_id and is
indexed as you can see below. If I change this field from an integer to a real,
I get about a 70x increase in performance of the query. The
EXPLAIN ANALYZE output is nearly identical, except for the casting of 1 to a
real when the column is a real. In real life, this query is actually bound and
parameterized, but I wished to simplify things a bit here (and don't yet know
how to EXPLAIN ANALYZE a parameterized query). Now in terms of actual performance, the same query
executed about 25k times takes 7 seconds with the real column, and 500 seconds
with the integer column. What gives here? Seems like integer (or serial) is a pretty
common choice for primary key columns, and therefore what I'm experiencing must
be an anomoly. Table "hits_olap.dim_carrier" Column | Type | Modifiers
-------------------+-----------------------------+----------- carrier_id | integer | not null dim_version | smallint | dim_effect_date | timestamp without time zone | dim_expire_date | timestamp without time zone | carrier_source_id | integer | carrier_name | character varying(30) | carrier_type | character varying(30) | carrier_scac | character varying(4) | carrier_currency | character varying(3) | current_row | smallint | default 0 Indexes: "dim_carrier_pk" PRIMARY KEY, btree
(carrier_id) "idx_dim_carrier_lookup" btree
(carrier_source_id) VACUUM ANALYZE REINDEX EXPLAIN ANALYZE SELECT CARRIER_ID, DIM_VERSION FROM
HITS_OLAP.DIM_CARRIER WHERE CARRIER_SOURCE_ID = '1' AND now() >=
DIM_EFFECT_DATE AND now() < DIM_EXPIRE_DATE;
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_dim_carrier_lookup on dim_carrier
(cost=0.00..12.10 rows=2 width=6) (actual time=0.076..0.077 rows=1 loops=1) Index Cond: (carrier_source_id = 1) Filter: ((now() >= dim_effect_date) AND (now() <
dim_expire_date)) Total runtime: 0.108 ms (4 rows) ALTER TABLE ALTER TABLE Table "hits_olap.dim_carrier" Column | Type | Modifiers
-------------------+-----------------------------+----------- carrier_id | integer | not null dim_version | smallint | dim_effect_date | timestamp without time zone | dim_expire_date | timestamp without time zone | carrier_source_id | real | carrier_name | character varying(30) | carrier_type | character varying(30) | carrier_scac | character varying(4) | carrier_currency | character varying(3) | current_row | smallint | default 0 Indexes: "dim_carrier_pk" PRIMARY KEY, btree
(carrier_id) "idx_dim_carrier_lookup" btree
(carrier_source_id) VACUUM ANALYZE REINDEX EXPLAIN ANALYZE SELECT CARRIER_ID, DIM_VERSION FROM
HITS_OLAP.DIM_CARRIER WHERE CARRIER_SOURCE_ID = '1' AND now() >=
DIM_EFFECT_DATE AND now() < DIM_EXPIRE_DATE; QUERY
PLAN ------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_dim_carrier_lookup on dim_carrier (cost=0.00..12.10
rows=2 width=6) (actual time=0.068..0.069 rows=1 loops=1) Index Cond: (carrier_source_id = 1::real) Filter: ((now() >= dim_effect_date) AND (now() <
dim_expire_date)) Total runtime: 0.097 ms (4 rows) Thanks for the help, Dave Greco |