filter duplicates by priority

"Clark Slater" <pg@xxxxxxxxxxx> · Tue, 14 Jul 2009 10:04:12 -0400 (EDT)

Hello-

I am trying to use DISTINCT ON to filter out *potential* duplicate values
from a set of sub queries.  There are certain cases where there can be
repetitive part numbers that are priced differently.  I'm trying to start
with the full list, ordered by priority, and then remove any repeats that
have a lesser priority.

SELECT DISTINCT ON (part_number) * FROM (
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
) AS filter_duplicates ORDER BY priority,part_number

The above statement does not work because if I ORDER BY
priority,part_number then I have to DISTINCT ON (priority,part_number). 
But DISTINCT ON (priority, part_number) does not remove the repeated rows
because the same part_number with a different priority becomes a distinct
tuple.

Any suggestions are appreciated.

--------------------------------------

A more detailed explanation of my problem follows:

I am working on an e-commerce system that has different lists of products 
which contain many of the same products, at different prices.  When a user
searches for a certain set of part numbers, I would like the resulting 
products (and prices) to come from one of the lists, according to the 
list's priority.  Each user can have a different set of lists and 
priorities.

Table: product_lists
  id |             name             | priority |  user_id
-----+------------------------------+----------+----------
  5  |  General List of Products    |     2    |     23
  3  |  Different List of Products  |     3    |     23
 150 |  Customer-Specific Products  |     1    |     23

Table: products
 product_list_id | part_number |  price
-----------------+-------------+--------
        3        |   92298A    | 123.38
        5        |   92298A    | 111.04
        3        |   C39207    |  78.38
       150       |   C39207    |  67.93

Below is a simplified example of the structure of the query I am working 
with.  I realize that in this case, I could re-factor all of this into one
statement, but each sub-query in the real case has a more complex set of 
joins that determines the price.  The pricing joins from one sub-query to 
the next vary, so a collection of sub-queries seemed to be a logical 
solution.  Some part numbers are found in only one of the lists, while 
other part numbers are repeated across lists at different prices.

This is what I would *like* to say:

SELECT DISTINCT ON (part_number) * FROM (

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id=product_lists.id
AND product_list_id=150
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

UNION ALL

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id= product_lists.id
AND product_list_id=5
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

UNION ALL

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id= product_lists.id
AND product_list_id=3
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

) AS filter_duplicates ORDER BY priority,part_number

I need to ORDER BY priority so that, in the case of duplicates, the
product from the desired list is returned first.  Then the purpose of
DISTINCT ON is to filter out any duplicate part numbers that have a lesser
priority.  But, the above statement fails because the DISTINCT ON 
expression must match the leftmost ORDER BY expression.  However,
inserting the priority into the DISTINCT ON expression means that all of
the resulting tuples are unique, even though the part_number is the same.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general