On Tue, Apr 15, 2014 at 10:56 AM, Chris Curvey <chris@xxxxxxxxxxxxxxx> wrote:
On Mon, Apr 14, 2014 at 12:27 PM, Robert DiFalco <robert.difalco@xxxxxxxxx> wrote:
I have several related tables that represent a call state. Let's think of these as phone calls to simplify things. Sometimes I need to determine the last time a user was called, the last time a user answered a call, or the last time a user completed a call.The basic schema is something like this:CREATE TABLE calls (id BIGINT NOT NULL, // sequence generatoruser_id BIGINT NOT NULL,called TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,PRIMARY KEY (id),FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE);CREATE TABLE calls_answered (id BIGINT NOT NULL,answered TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,PRIMARY KEY (id),FOREIGN KEY (id) REFERENCES calls(id) ON DELETE CASCADE);And so on for calls_connected, calls_completed, call_errors, etc.Occasionally I will want to know things like "When was the last time a user answered a call" or "How many times has a user been called".I can do these queries using a combination of MAX or COUNT. But I'm concerned about the performance.SELECT MAX(a.id)WHERE c.user_id = ?;Or the number of answered calls:Sometimes I might want to get this data for a whole bunch of users. For example, "give me all users whose have not answered a call in the last 5 days." Or even "what percentage of users called actually answered a call." This approach could become a performance issue. So the other option is to create a call_summary table that is updated with triggers.The summary table would need fields like "user_id", "last_call_id", "call_count", "last_answered_id", "answered_count", "last_completed_id", "last_completed_count", etc.My only issue with a summary table is that I don't want a bunch of null fields. For example, if the user was called but they have never answered at call then the last_call_id and call_count fields on the summary table would be non-NULL but the last_answer_id and answer_count fields WOULD be NULL. But over time all fields would eventually become non-NULL.So that leads me to a summary table for EACH call state. Each summary table would have a user id, a ref_id, and a count -- one summary table for each state e.g. call_summary, call_answered_summary, etc.This approach has the down side that it creates a lot of tables and triggers. It has the upside of being pretty efficient without having to deal with NULL values. It's also pretty easy to reason about.So for my question -- is the choice between these a personal preference sort of thing or is there a right or wrong approach? Am I missing another approach that would be better? I'm okay with SQL but I'm not expert so I'm not sure if there is an accepted DESIGN PATTERN for this that I am missing.Thanks!
(Sorry, fat-fingered and hit "send too early"...)
CREATE TABLE calls (
id BIGINT NOT NULL, // sequence generator
user_id BIGINT NOT NULL,
called TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
answered TIMESTAMPTZ NULL,
connected TIMESTAMPTZ NULL,
completed TIMESTAMPTZ NULL,
PRIMARY KEY (id),
FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
);
Then your queries end up looking like this:
--last time john answered
FROM calls
where answered is not null
and user_id = ?
-- users that have not answered a call in the last five days (I can think of a few ways to interpret that phrase)
select myusers.*
from myusers
where not exists
( select *
from calls
where calls.user_id = myusers.user_id
and answered >= <five days ago>)
-- average ring time
select avg(extract ('seconds' from called - answered))
where answered is not null
I asked the Internet how to train my cat, and the Internet told me to get a dog.