Search Postgresql Archives

Re: designing time dimension for star schema

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've done a lot of DSS architecture. A couple of thoughts:

- in most cases the ETL process figures out the time id's as part of the preparation and then does bulk loads into the fact tables I would be very concerned about performance of a trigger that fired for every row on the fact table

you mention you want to do data streaming instead of bulk loads, can you elaborate?


- When querying a star schema one of the performance features is the fact that all joins to the dimension tables are performed via a numeric key, such as:
"select * from fact, time_dim, geo_dim
  where fact.time_id = time_dim.time_id..."

In the case of this being a timestamp I suspect the performance would take a hit, depending on the size of your fact table and the scope/volume of your DSS queries this could easily be a show stopper based on the assumption that the database can do a numeric binary search much faster than a timestamp search





On 2/10/14, 9:45 AM, Mark Wong wrote:
Hello everybody,

I was wondering if anyone had any experiences they can share when
designing the time dimension for a star schema and the like.  I'm
curious about how well it would work to use a timestamp for the
attribute key, as opposed to a surrogate key, and populating the time
dimension with triggers on insert to the fact tables.  This is
something that would have data streaming in (as oppose to bulk
loading) and I think we want time granularity to the minute.

A simplified example:

-- Time dimension
CREATE TABLE time (
     datetime TIMESTAMP WITH TIME ZONE NOT NULL,
     day_of_week SMALLINT NOT NULL
);
CREATE UNIQUE INDEX ON time (datetime);

-- Fact
CREATE TABLE fact(
     datetime TIMESTAMP WITH TIME ZONE NOT NULL,
     FOREIGN KEY (datetime) REFERENCES time(datetime)
);

-- Function to populate the time dimension
CREATE OR REPLACE FUNCTION decompose_timestamp() RETURNS TRIGGER AS $$
BEGIN
     NEW.datetime = date_trunc('minutes', NEW.datetime);
     INSERT INTO time (datetime, day_of_week)
     VALUES (NEW.datetime, date_part('dow', NEW.datetime));
     RETURN NEW;
EXCEPTION
     WHEN unique_violation THEN
         -- Do nothing if the timestamp already exists in the dimension table.
         RETURN new;
END; $$
LANGUAGE 'plpgsql';

CREATE TRIGGER populate_time BEFORE INSERT
ON fact FOR EACH ROW
EXECUTE PROCEDURE decompose_timestamp();

Regards,
Mark





--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux