Re: Functionally dependent columns in SELECT DISTINCT

Willow Chargin <postgresql@xxxxxxxxxxxx> · Fri, 13 Sep 2024 09:12:12 -0700

Thanks both for your suggestions so far.

On Fri, Sep 13, 2024 at 8:43 AM David G. Johnston
<david.g.johnston@xxxxxxxxx> wrote:
>
> On Friday, September 13, 2024, Willow Chargin <postgresql@xxxxxxxxxxxx> wrote:
>>
>> In reality I really do want the ID columns of the
>> *most recent* items.
>
>
> Use a window function to rank them and pull out rank=1

Hmm, like this? noting that it's rank<=5, not rank=1:

    -- 1. rank all item-part combinations, densely since an item may
          have multiple parts
    -- 2. limit by rank, still retaining multiple copies of each item
    -- 3. de-duplicate IDs
    SELECT DISTINCT id FROM (
        SELECT id, dense_rank FROM (
            SELECT
                items.id,
                dense_rank() OVER (ORDER BY create_time DESC)
            FROM items JOIN parts ON items.id = parts.item_id
            WHERE part_id % 3 = 0
        ) q
        WHERE dense_rank <= 5
    ) q

I've done this before, but my experience is that it's usually far slower
because the rank is computed eagerly even for rows that don't match the
rank bound. And indeed here it takes 20% longer than even the slower
GROUP BY from before: https://explain.depesz.com/s/mQIi

> or use a lateral subquery to surgically (fetch first 1) retrieve the first row when sorted by recency descending.

I'm not sure that I see how to apply this when I need top-k, not top-1.