1. >500K rows per day into the calls table.
2. Very rarely. The only common query is gathering users
that have not been called "today" (along with some other
qualifying criteria). More analytical queries/reports are done
for internal use and it is not essential that they be
lickity-split.
a. Usually just one connection at a time executes read
queries.
b. the users not called today query will be done once a
day.
c. Daily
d. All users for the last year (if you are asking about
retention). We will also rarely have to run for all time.
e. Not that I'm aware of (or seen) today.
f. For the simple queries we cannot afford latency between
calls and querying who was already called.
While I don't seem to be getting much support for it here
:D my write performance (which is most essential) has been
much better since I further normalized the tables and made it
so that NULL is never used and data is never updated (i.e. it
is immutable once it is written).
As for wanting to avoid NULLs I don't really know what to
say. Obviously some times NULL's are required. For this design
I don't really need them and they make the data harder to
reason about (because they are kind of open to
interpretation). They can also give you different results
than you sometimes expect (for example when looking for a non
matching key, you start having to inject some OR IS NULLs and
such). Also, the absence of null can make a lot of queries
more optimal). That said, I understand where you all are
coming from with de-normalization. It's definitely the path of
the least resistance. Our instinct is to want to see all
related data in a single table when possible.
The summary table was really a separate point from whether
or not people liked my schema or not -- I mean whether I
de-normalize as people are asking or not, there would still be
the question of a summary table for MAX and COUNT queries or
to not have a summary table for those. I probably made the
original question too open ended.