On 2/4/2014 10:06 PM, Roy Anderson wrote:
We have an OLTP database and no data warehouse. We are currently planning out a build for a data warehouse however (possibly using Hadoop). "X" is recommending that we convert our current, normalized OLTP database into a flattened Star Schema. The primary rationale for this OLTP flattening is that since most of the app calls to the database are SELECT statements, the DB should be optimized for that. A peripheral rationale I've heard is that the 1NF-3NF database is hard to understand sometimes. I'm not going to state what I believe. I would like to hear my esteemed colleagues take on this situation. What would you do? What would you say? Is there anything you would investigate? In short, what say you? Thanks for your time.
I think NF is more about correctness than about speed. People saying they denormalized a database to improve speed also de-corrected it, and I doubt they improved the speed.
The only experience I have with data warehousing is storing my apache logs in PG. I have have many mega-millions of rows. So many that I don't run select count(*) because it takes forever.
I have an index on hitdate, so I can pull any small number of records based on time very quickly, but I can never select all of them. When I add detail rows to the table I also add to a table of hourly summaries. My graphs are run off the summary tables and pulling a few months worth of data is very quick.
So I guess it would come down to how you need to access your data warehouse. If you need to "learn" things by hitting every detail record for all time, then PG isn't gonna work well for you.
If you can do indexed queries and return small subsets, PG will work great. If you can keep ongoing summaries you'll be fine too. Rebuilding a summary table (or creating a new one) is painful, but possible as long as your not in a huge hurry.
-Andy -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general