Re: Window functions and index usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/04/2011 04:27 PM, Robert Klemme wrote:
On Tue, Oct 4, 2011 at 11:39 AM, Anssi Kääriäinen
<anssi.kaariainen@xxxxxx>  wrote:
I have the following setup:

create table test(id integer, seq integer);
insert into test select generate_series(0, 100), generate_series(0, 1000);
create unique index test_idx on test(id, seq);
analyze test;

Now I try to fetch the latest 5 values per id, ordered by seq from the
table:

select * from (
select id, seq, row_number() over (partition by id order by seq)
  from test
  where id in (1, 2, 3)
) where row_number()<= 5;
It seems this fetches the *first* 5 values per id - and not the
latest.  For that you would need to "order by seq desc" in the window
and probably also in the index.

Yeah. Sorry, wrong order. And the last line is wrong, it should be ") tmp where row_number <= 5;".
This does not use the index on id, seq for sorting the data. It uses a
bitmap index scan, and sequential scan when issued SET enable_bitmapscan to
false. Tested both on git head and 8.4.8. See end of post for plans. It
seems it would be possible to fetch the first values per id using the index,
or at least skip the sorting.
Just guessing: since row_number is an analytic function and it can be
combined with any type of windowing only in "rare" cases do the
ordering of index columns and the ordering in the window align.  So
while your particular use case could benefit from this optimization
the overall judgement might be that it's not worthwhile to make the
optimizer more complex to cover this case - or I fail to see the more
general pattern here. :-)

I think there are common use cases for this - see end of message for an example.
Is there some other way to run the query so that it would use the index? Is
there plans to support the index usage for the above query assuming that it
is possible to use the index for that query?

The real world use case would be to fetch latest 5 threads per discussion
forum in one query. Or fetch 3 latest comments for all patches in given
commit fest in single query.
Is it really that realistic that someone wants the latest n entries
for *all* threads / patches?  It seems since this can result in a very
large data set this is probably not the type of query which is done
often.
The idea is that the dataset isn't that large. And you don't have to fetch them for all threads / patches. You might fetch them only for patches in currently viewed commit fest. See https://commitfest.postgresql.org/action/commitfest_view?id=12 for one such use. What I have in mind is fetching first all the patches in the commit fest in one go. Then issue query which would look something like:
 select * from
(select comment_data, row_number() over (partition by patch_id order by comment_date desc)
       from patch_comments
     where patch_id in (list of patch_ids fetched in first query)
   ) tmp where row_number <= 3;

Now you have all the data needed for the first column in the above mentioned page.

I guess the commit fest application is fetching all the comments for the patches in the commit fest in one query, and then doing the limit in application code. This is fine because there aren't that many comments per patch. But if you have dozens of forums and thousands of threads per forum you can't do that.

This is useful in any situation where you want to show n latest entries instead of the last entry in a list view. Latest modifications to an object, latest commits to a file, latest messages to a discussion thread or latest payments per project. Or 5 most popular videos per category, 10 highest paid employees per department and so on.

 - Anssi Kääriäinen

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux