On 04/15/2014 09:53 AM, Robert DiFalco wrote:
Actually that was exactly the initial table design. There were more
fields because for my use case there were a lot more states and
certain states have additional data (for example when a call goes from
answered to connected it also gets the user_id of the person being
connected to). So that one table started getting a LOT of columns
which starting making it hard to reason about.
The more normalized version has a couple of things going for it.
COUNT, MIN, MAX, etc are very fast because I don't have to
conditionally add null checks. Everything is inserted so for the
millions of calls that get made the normalized schema was much more
efficient for writing. It was also easier to understand. The answer
table only has calls that were answered, the error table only has
calls the resulted in an error after being connected, etc.
I know this kind of gets into a religious area when discussing NULLs
and what level of normalization is appropriate so I don't want to
spark any of that on this thread. But only doing inserts and never
doing updates or deletes performed very well for large data sets.
That said, I could explore a compromise between the monolithic table
approach and the completely normalized set of tables approach. Thanks
for your input!
I wonder if the "LOT of columns" are the bits that need to be parcelled
off as specific to one condition of a call?
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general