Search Postgresql Archives

Avoiding double-counting in aggregates with more than one join?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

I've noticed in the past that doing aggregates while joining to more than one table can sometimes give you unintended results. For example, suppose I have three tables: products, sales, and resupplies. In sales I track what I sell, and in resupplies I track my own purchases to increase inventory. Both have a foreign key to products. Now I want to run a report showing the total dollars sold for each product versus the total dollars spent for each product. I could try this:

    SELECT  p.id,
            SUM(s.price * s.qty) AS total_sold,
            SUM(r.price * r.qty) AS total_spent
    FROM    products p
    LEFT OUTER JOIN sales s
    ON      s.product_id = p.id
    LEFT OUTER JOIN resupplies r
    ON      r.product_id = p.id
    GROUP BY p.id
    ;

That seems pretty safe, but actually I get bad answers,
for example if some product has this data:

    sales
    -----
    sold 1 @ $2/ea

    resupplies
    ----------
    bought 1 @ $1/eq
    bought 2 @ $1/ea

Then pre-grouping I have this:

    p.id | s.qty | s.price | r.qty | r.price
    -----+-------+---------+-------+--------
       1 |     1 |      $2 |     1 |      $1
       1 |     1 |      $2 |     2 |      $1

You can see the problem is that I'm going to double-count my sales.
What I really want is this:

    p.id | s.qty | s.price | r.qty | r.price
    -----+-------+---------+-------+--------
       1 |     1 |      $2 |     1 |      $1
       1 |       |         |     2 |      $1

In the past I've always handled these situations by aggregating each table separately
and only then joining things together:

    WITH
    s AS (
      SELECT  product_id,
              SUM(price * qty) AS total_sold
      FROM    sales
      GROUP BY product_id) s
    ),
    r AS (
      SELECT  product_id,
              SUM(price * qty) AS total_spent
      FROM    resupplies
      GROUP BY product_id) r
    )
    SELECT  p.id,
            COALESCE(s.total_sold, 0),
            COALESCE(r.total_spent, 0)
    FROM    products p
    LEFT OUTER JOIN s
    ON      s.product_id = p.id
    LEFT OUTER JOIN r
    ON      r.product_id = p.id
    ;

Since I've guaranteed that each CTE includes at most one row per product,
this is safe from double-counting errors.

But is there a better way? My approach feels verbose
and harder to read. Also you have to type COALESCE a lot. :-)
Is there some different way of doing things I haven't thought of yet?

Also I wonder about the performance merging all these subqueries together.
Would the final merging be any faster if I had an ORDER BY in each CTE?

It seems like this pattern comes up a lot;
what have others done about it?

Thanks,
Paul


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux