Re: Join query help

Michael Glaesemann <grzm@xxxxxxxxxxxxxxx> · Mon, 20 Aug 2007 20:27:19 -0500

On Aug 20, 2007, at 19:52 , novice wrote:

Try 2: Here are my DDLs & DMLs

Thanks for the data. It looks like the data you provided in the first  
set was a little different, and the queries I supplied in my previous  
message give you the results you want.

CREATE TABLE record
(
  record_id integer PRIMARY KEY,
  record_date timestamp with time zone NOT NULL
);

INSERT INTO record(record_id, record_date) VALUES ('1', '2007-07-23  
11:30:37');

Note: record_id is in integer, yet you're quoting the value ('1').  
This causes the server to cast the text value to an integer. Here  
it's not going to cause much of a problem, just a couple CPU cycles.  
In table definitions (and possibly queries?) it can cause the server  
to ignore otherwise usable indexes when planning queries. (I see  
below your other inserts also quote integer values: you should drop  
the quotes there as well.)

CREATE TABLE score
(
  score_id integer PRIMARY KEY,
  score_description character(7) NOT NULL

);

Unless you have a strict business rule that score_description can  
have no more than 7 characters, you should just use text instead of  
character(7). It provides you more freedom (for example, you don't  
have to change the column type if you ever want to use longer  
descriptions). character(7) does not gain you anything in terms of  
performance.

I'd also recommend adding a UNIQUE constraint to score_description.  
It appears your score_id has no meaning other than use as a surrogate  
key. There's nothing to prevent INSERT INTO score (score_id,  
score_description) VALUES (5, 'SAFE') which would be quite confusing,  
I should think. And if you don't have any particular reason to use a  
surrogate key, you could just use score_description as the primary  
key  of the table, dropping score_id altogether. it would reduce the  
number of joins you needed to do to have easily interpretable query  
results. (This all goes for your record table as well.)

I like to query for a result set that will also have the sum(score_id)
where score_id = '1' like the following

   week   | records | inspections | score
----------+---------+-------------+------
 2007, 30 |       2 |           8 |    6
 2007, 29 |       1 |           2 |    1
(2 rows)

(repeating from earlier post)

SELECT to_char(record_week, 'YYYY, IW') as formatted_record_week
    , count(DISTINCT record_id) AS record_count
    , count(DISTINCT observation_id) AS observation_count
    , safe_score_sum
FROM record_with_week
NATURAL JOIN observation
NATURAL JOIN (
    SELECT record_week, sum(score_id) as safe_score_sum
    FROM record_with_week
    NATURAL JOIN observation
    NATURAL JOIN score
    WHERE score_description = 'SAFE'
    GROUP BY record_week
    ) safe_observation
GROUP BY record_week, safe_score_sum;
formatted_record_week | record_count | observation_count |  
safe_score_sum
-----------------------+--------------+------------------- 
+----------------
2007, 29              |            1 |                 2  
|              1
2007, 30              |            2 |                 8  
|              6
(2 rows)

This will help identify that there were 6 SAFE observations found from
the 8 inspections on week 30.

Yeah, I thought so: you're actually looking for the *count* of SAFE  
observations, not the sum of the score_id for 'SAFE'. So what you  
really want is:

SELECT TO_CHAR(record_week, 'YYYY, IW') AS formatted_record_week
    , count(DISTINCT record_id) AS record_count
    , count(DISTINCT observation_id) AS observation_count
    , count(DISTINCT safe_observation_id) as safe_observation_count
FROM record_with_week
NATURAL JOIN observation
NATURAL JOIN (
    SELECT record_week
        , observation_id as safe_observation_id
    FROM record_with_week
    NATURAL JOIN observation
    NATURAL JOIN score
    WHERE score_description = 'SAFE'
    ) safe_observation
GROUP BY record_week;
formatted_record_week | record_count | observation_count |  
safe_observation_count
-----------------------+--------------+------------------- 
+------------------------
2007, 29              |            1 |                 2  
|                      1
2007, 30              |            2 |                 8  
|                      6
(2 rows)

Michael Glaesemann
grzm seespotcode net

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend