On Tue, Feb 4, 2014 at 10:06 PM, Roy Anderson <roy.anderson@xxxxxxxxx> wrote: > We have an OLTP database and no data warehouse. We are currently > planning out a build for a data warehouse however (possibly using > Hadoop). "X" is recommending that we convert our current, normalized > OLTP database into a flattened Star Schema. > > The primary rationale for this OLTP flattening is that since most of > the app calls to the database are SELECT statements, the DB should be > optimized for that. A peripheral rationale I've heard is that the > 1NF-3NF database is hard to understand sometimes. > > I'm not going to state what I believe. I would like to hear my > esteemed colleagues take on this situation. What would you do? What > would you say? Is there anything you would investigate? In short, what > say you? couple quick thoughts: *) what kinds of problems is the data warehouse going to solve the current database is not solving? *) Have you conclusively proven the current database can't solve those problems? *) the phrase '1NF-3NF is poor for very general application style X' is a typical comment from someone who read a book or an article on databases without having a lot of practical experience in implementing systems. Experienced database guys rarely talk in such terms; they slip in an out of 'normal forms' naturally in order to meet the demands of whatever task they may be happening to solve. *) a deficit in knowledge in scaling large data is typically made worse, not better, by switching up the software stack, particularly if you don't have very clear idea of what problems you are trying to solve by switching and what problems you will run into with the new stack *) Hardware is on the march (SSD). Large data scaling is often a function of hardware, not software. Scaling out to multiple systems from a monolithic architecture is not a decision to be taken lightly and the calculus of that decision must be constantly revisited in the face of hardware improvements. Scaling horizontally may ultimately cost more in hardware in the end. *) Anyone who has ever uttered the statement: "databases are slower than technology X because databases utilize disks and technology X stores everything in memory" has absolutely no concept of how databases (or operating systems) work. *) Star schemas are great for supporting certain kinds of OLAP tools but are IMNSHO overrated. If your reporting requirements are extremely generic or not very well defined you may have to use that or a similar pattern however. *) There is no rule that says if you use databases there must only be one. Partitioning databases for large analytical queries is a reasonable path if you have tons of data. You can then have the application or (my preference) dblink then aggregate data together. *) Do not consider any advice to implement exotic storage backend from someone that has not previously implemented that same technology on a similar scale on a previous project, ever. Data of large scale is hard. Installing magical tool X often besides not solving the problem gives you another difficult problem to solve. merlin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general