Hi Pushkar: On Wed, Mar 30, 2016 at 8:40 AM, Deole, Pushkar (Pushkar) <pdeole@xxxxxxxxx> wrote: > Does PostgreSQL support a query to fetch last ‘n’ records that match the > selection criteria. I am trying to fetch records from a table with start > date that falls in last 30 days, however, I want to fetch the oldest ‘n’ > records and not the recent ones. I know there is a LIMIT clause which I can > use but it will fetch the first ‘n’ records. > > I came across an approach which says that I can reverse the order and then > use LIMIT and then order the records back using timestamp as below, but > looking at the execution plan, it has to do a sort twice which may affect > the performance of query if ‘n’ is large number: To get at the last N records you generally have to approaches, read all of them Do you have indexes on the record date? Because in this case it seems that could be solved by a reverse index scan, In my case with a somehow big table: $ \d carrier_cdrs_201603 Table "public.carrier_cdrs_201603" Column | Type | Modifiers ---------+--------------------------+----------- ... setup | timestamp with time zone | ... Indexes: "idx_carrier_cdrs_201603_setup" btree (setup) ... $ explain select * from carrier_cdrs_201603 order by setup desc limit 1000; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.42..46.25 rows=1000 width=81) -> Index Scan Backward using idx_carrier_cdrs_201603_setup on carrier_cdrs_201603 (cost=0.42..33585.03 rows=732883 width=81) (2 rows) $ explain select * from ( select * from carrier_cdrs_201603 order by setup desc limit 1000 ) last_1000 order by setup; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=106.08..108.58 rows=1000 width=81) Sort Key: carrier_cdrs_201603.setup -> Limit (cost=0.42..46.25 rows=1000 width=81) -> Index Scan Backward using idx_carrier_cdrs_201603_setup on carrier_cdrs_201603 (cost=0.42..33585.03 rows=732883 width=81) (4 rows) $ explain with last_1000 as ( select * from carrier_cdrs_201603 order by setup desc limit 1000 ) select * from last_1000 order by setup; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Sort (cost=116.08..118.58 rows=1000 width=184) Sort Key: last_1000.setup CTE last_1000 -> Limit (cost=0.42..46.25 rows=1000 width=81) -> Index Scan Backward using idx_carrier_cdrs_201603_setup on carrier_cdrs_201603 (cost=0.42..33585.03 rows=732883 width=81) -> CTE Scan on last_1000 (cost=0.00..20.00 rows=1000 width=184) (6 rows) The faster for me seems to be the subquery way, with timings and usaing 10k records it says: $ explain analyze select * from ( select * from carrier_cdrs_201603 order by setup desc limit 10000 ) last_10000 order by setup; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=1223.09..1248.09 rows=10000 width=81) (actual time=29.646..35.780 rows=10000 loops=1) Sort Key: carrier_cdrs_201603.setup Sort Method: quicksort Memory: 1791kB -> Limit (cost=0.42..458.70 rows=10000 width=81) (actual time=0.015..20.707 rows=10000 loops=1) -> Index Scan Backward using idx_carrier_cdrs_201603_setup on carrier_cdrs_201603 (cost=0.42..33627.38 rows=733773 width=81) (actual time=0.013..8.835 rows=10000 loops=1) Total runtime: 41.913 ms (6 rows) And I fear its scanning and feeding into the sort, and accounting for a part of the scan time in the sort phase as just the inner query gives: $ explain analyze select * from carrier_cdrs_201603 order by setup desc limit 10000 ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.42..458.70 rows=10000 width=81) (actual time=0.015..20.938 rows=10000 loops=1) -> Index Scan Backward using idx_carrier_cdrs_201603_setup on carrier_cdrs_201603 (cost=0.42..33627.38 rows=733773 width=81) (actual time=0.013..8.803 rows=10000 loops=1) Total runtime: 27.020 ms (3 rows) So, 14 ms to sort 10k records seems like a reasonable price to pay. As you see, only one sort, in whichever order I do it, and postgres sorts really fast. This is very difficult to avoid. A smarter optimizer could turn the sort into a reverse, but it seems difficult. Or you could try to use a cursor, goto to the last record, and then skip N backwards and go fro there, but IMHO it's not worth the complexity, and, at least in my case, it is slower for 1000 records, but YMMV $ begin; BEGIN Time: 61.229 ms $ declare last_1000 scroll cursor for select * from carrier_cdrs_201603 order by setup; DECLARE CURSOR Time: 61.025 ms $ move last in last_1000; MOVE 1 Time: 282.142 ms $ move backward 1000 in last_1000; MOVE 1000 Time: 61.969 ms $ fetch all from last_1000; Time: 248.071 ms $ close last_1000; CLOSE CURSOR Time: 60.922 ms $ commit; COMMIT Time: 60.814 ms Note how once you account for my 60ms RTT It's taking 220 ms to go to the end, and 188 to fetch the result, while: cdrs=# select * from ( select * from carrier_cdrs_201603 order by setup desc limit 1000 ) last_1000 order by setup; Time: 248.566 ms I can do the select in just 188 too. ( This are just 1000 records, but without explain analyze a nice chunk of the time is spent sending them over my 60 ms RTT connection ). Anyway, try things, measure, post results so we know what happens. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general