On 1/23/07, Tobias Brox <tobias@xxxxxxxxxxxxx> wrote:
Does anyone have experience with using postgres for data warehousing? Right, I saw one post suggestion to use mysql for a mostly read-only database ... but anyway, I think it's not a question to change the database platform for this project, at least not today ;-) Ralph Kimball seems to be some kind of guru on data warehousing, and in his books he's strongly recommending to have a date dimension - simply a table describing all dates in the system, and having attributes for what day of the week it is, month, day of the month, week number, bank holiday, anything special, etc. Well, it does make sense if adding lots of information there that cannot easily be pulled out from elsewhere - but as for now, I'm mostly only interessted in grouping turnover/profit by weeks/months/quarters/years/weekdays. It seems so much bloated to store this information, my gut feeling tells it should be better to generate them on the fly. Postgres even allows to create an index on an expression. The question is ... I'm curious about what would yield the highest performance - when choosing between: select extract(week from created), ... from some_table where ... group by extract(week from created), ... sort by extract(week from created), ... and: select date_dim.week_num, ... from some_table join date_dim ... where ... group by date_dim.week_num, ... sort by date_dim, week_num, ... The date_dim table would eventually cover ~3 years of operation, that is less than 1000 rows.
In my opinion, I would make a date_dim table for this case. I would strongly advice against making a date_id field, just use the date itself as the p-key (i wouldn't bother with RI links to the table though). I would also however make a function and use this to make the record: create or replace function make_date_dim(in_date date) returns date_dim as $$ [...] And make date_dim records this way: insert into date_dim select * from make_dim('01/01/2001'::date); (or pre-insert with generate_series). now you get the best of both worlds: you can join to the table for the general case or index via function for special case indexes. for example suppose you had to frequently count an account's sales by fiscal year quarter irrespective of year: create index q_sales_idx on account_sale(account_no, (make_dim(sale_date)).fiscal_quarter); also you can use the function in place of a join if you want. In some cases the join may be better, though. merlin