Re: Approach to Data Summary and Analysis

Rob Sargent <robjsargent@xxxxxxxxx> · Tue, 15 Apr 2014 10:02:08 -0600

On 04/15/2014 09:53 AM, Robert DiFalco wrote:
Actually that was exactly the initial table design. There were more 
fields because for my use case there were a lot more states and 
certain states have additional data (for example when a call goes from 
answered to connected it also gets the user_id of the person being 
connected to). So that one table started getting a LOT of columns 
which starting making it hard to reason about.

The more normalized version has a couple of things going for it. 
COUNT, MIN, MAX, etc are very fast because I don't have to 
conditionally add null checks. Everything is inserted so for the 
millions of calls that get made the normalized schema was much more 
efficient for writing. It was also easier to understand. The answer 
table only has calls that were answered, the error table only has 
calls the resulted in an error after being connected, etc.

I know this kind of gets into a religious area when discussing NULLs 
and what level of normalization is appropriate so I don't want to 
spark any of that on this thread. But only doing inserts and never 
doing updates or deletes performed very well for large data sets.

That said, I could explore a compromise between the monolithic table 
approach and the completely normalized set of tables approach. Thanks 
for your input!

I wonder if the "LOT of columns" are the bits that need to be parcelled 
off as specific to one condition of a call?

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general