Window functions and index usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have the following setup:

create table test(id integer, seq integer);
insert into test select generate_series(0, 100), generate_series(0, 1000);
create unique index test_idx on test(id, seq);
analyze test;

Now I try to fetch the latest 5 values per id, ordered by seq from the table:

select * from (
select id, seq, row_number() over (partition by id order by seq)
  from test
 where id in (1, 2, 3)
) where row_number() <= 5;

This does not use the index on id, seq for sorting the data. It uses a bitmap index scan, and sequential scan when issued SET enable_bitmapscan to false. Tested both on git head and 8.4.8. See end of post for plans. It seems it would be possible to fetch the first values per id using the index, or at least skip the sorting.

If I emulate the behavior I want by using:
 (select id, seq from test where id = 1 order by seq limit 5)
union
 (select id, seq from test where id = 2 order by seq limit 5)
union
 (select id, seq from test where id = 2 order by seq limit 5);
I get two orders of magnitude faster execution.

Is there some other way to run the query so that it would use the index? Is there plans to support the index usage for the above query assuming that it is possible to use the index for that query?

The real world use case would be to fetch latest 5 threads per discussion forum in one query. Or fetch 3 latest comments for all patches in given commit fest in single query.

 - Anssi Kääriäinen

Normal plan (by the way, note the wildly inaccurate topmost row estimate):

                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Subquery Scan on tmp (cost=711.65..805.05 rows=958 width=16) (actual time=10.543..27.469 rows=15 loops=1)
   Filter: (tmp.row_number <= 5)
-> WindowAgg (cost=711.65..769.13 rows=2874 width=8) (actual time=10.537..23.551 rows=3003 loops=1) -> Sort (cost=711.65..718.83 rows=2874 width=8) (actual time=10.528..13.798 rows=3003 loops=1)
               Sort Key: test.id, test.seq
               Sort Method: quicksort  Memory: 182kB
-> Bitmap Heap Scan on test (cost=59.04..546.55 rows=2874 width=8) (actual time=0.580..4.750 rows=3003 loops=1)
                     Recheck Cond: (id = ANY ('{1,2,3}'::integer[]))
-> Bitmap Index Scan on test_idx (cost=0.00..58.32 rows=2874 width=0) (actual time=0.490..0.490 rows=3003 loops=1)
                           Index Cond: (id = ANY ('{1,2,3}'::integer[]))
 Total runtime: 27.531 ms
(11 rows)

Plan with set enable_bitmapscan set to off:

                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Subquery Scan on tmp (cost=2003.23..2096.64 rows=958 width=16) (actual time=32.907..47.279 rows=15 loops=1)
   Filter: (tmp.row_number <= 5)
-> WindowAgg (cost=2003.23..2060.71 rows=2874 width=8) (actual time=32.898..44.053 rows=3003 loops=1) -> Sort (cost=2003.23..2010.42 rows=2874 width=8) (actual time=32.883..36.067 rows=3003 loops=1)
               Sort Key: test.id, test.seq
               Sort Method: quicksort  Memory: 182kB
-> Seq Scan on test (cost=0.00..1838.14 rows=2874 width=8) (actual time=0.017..26.733 rows=3003 loops=1)
                     Filter: (id = ANY ('{1,2,3}'::integer[]))
 Total runtime: 47.334 ms
(9 rows)

The UNION approach:

                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=28.47..28.62 rows=15 width=8) (actual time=0.176..0.194 rows=15 loops=1) -> Append (cost=0.00..28.40 rows=15 width=8) (actual time=0.026..0.149 rows=15 loops=1) -> Limit (cost=0.00..9.42 rows=5 width=8) (actual time=0.024..0.045 rows=5 loops=1) -> Index Scan using test_idx on test (cost=0.00..1820.87 rows=967 width=8) (actual time=0.022..0.034 rows=5 loops=1)
                     Index Cond: (id = 1)
-> Limit (cost=0.00..9.42 rows=5 width=8) (actual time=0.017..0.034 rows=5 loops=1) -> Index Scan using test_idx on test (cost=0.00..1820.87 rows=967 width=8) (actual time=0.015..0.023 rows=5 loops=1)
                     Index Cond: (id = 2)
-> Limit (cost=0.00..9.42 rows=5 width=8) (actual time=0.021..0.037 rows=5 loops=1) -> Index Scan using test_idx on test (cost=0.00..1820.87 rows=967 width=8) (actual time=0.019..0.026 rows=5 loops=1)
                     Index Cond: (id = 3)
 Total runtime: 0.258 ms
(12 rows)





--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux