Re: Offending My Tender Sensibilities -OR- OLTP on a Star Schema

Merlin Moncure <mmoncure@xxxxxxxxx> · Thu, 6 Feb 2014 08:51:37 -0600

On Tue, Feb 4, 2014 at 10:06 PM, Roy Anderson <roy.anderson@xxxxxxxxx> wrote:
> We have an OLTP database and no data warehouse. We are currently
> planning out a build for a data warehouse however (possibly using
> Hadoop). "X" is recommending that we convert our current, normalized
> OLTP database into a flattened Star Schema.
>
> The primary rationale for this OLTP flattening is that since most of
> the app calls to the database are SELECT statements, the DB should be
> optimized for that. A peripheral rationale I've heard is that the
> 1NF-3NF database is hard to understand sometimes.
>
> I'm not going to state what I believe. I would like to hear my
> esteemed colleagues take on this situation. What would you do? What
> would you say? Is there anything you would investigate? In short, what
> say you?

couple quick thoughts:
*) what kinds of problems is the data warehouse going to solve the
current database is not solving?

*) Have you conclusively proven the current database can't solve those problems?

*) the phrase '1NF-3NF is poor for very general application style X'
is a typical comment from someone who read a book or an article on
databases without having a lot of practical experience in implementing
systems.  Experienced database guys rarely talk in such terms; they
slip in an out of 'normal forms' naturally in order to meet the
demands of whatever task they may be happening to solve.

*) a deficit in knowledge in scaling large data is typically made
worse, not better, by switching up the software stack, particularly if
you don't have very clear idea of what problems you are trying to
solve by switching and what problems you will run into with the new
stack

*) Hardware is on the march (SSD).  Large data scaling is often a
function of hardware, not software.  Scaling out to multiple systems
from a monolithic architecture is not a decision to be taken lightly
and the calculus of that decision must be constantly revisited in the
face of hardware improvements.  Scaling horizontally may ultimately
cost more in hardware in the end.

*) Anyone who has ever uttered the statement: "databases are slower
than technology X because databases utilize disks and technology X
stores everything in memory" has absolutely no concept of how
databases (or operating systems) work.

*) Star schemas are great for supporting certain kinds of OLAP tools
but are IMNSHO overrated.  If your reporting requirements are
extremely generic or not very well defined you may have to use that or
a similar pattern however.

*) There is no rule that says if you use databases there must only be
one.  Partitioning databases for large analytical queries is a
reasonable path if you have tons of data.  You can then have the
application or (my preference) dblink then aggregate data together.

*) Do not consider any advice to implement exotic storage backend from
someone that has not previously implemented that same technology on a
similar scale on a previous project, ever.  Data of large scale is
hard.  Installing magical tool X often besides not solving the problem
gives you another difficult problem to solve.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general