RE: Why is there a Sort after an Index Only Scan?

André Hänsel <andre@xxxxxxxx> · Thu, 5 May 2022 03:12:40 +0200

> They are sorted by order_id only within sets of the same shipping_date, which is not good enough.  

Ah yes, that totally makes sense for the general case.

> so maybe you can just tweak that application to test if the start and end dates are the same and use equality when they are.

I definitely can.

But now I have a followup question, which probably should have been a separate question all along. I have modified the example a bit to have a more natural date distribution and I got rid of the weird shipping_date condition and actually made it different dates, so the index order is out of the picture. I also added some statistics so Postgres knows about the relationship between the columns.

https://dbfiddle.uk/?rdbms=postgres_14&fiddle=54c7774432e896e3c0e89d8084c4b194

After inserting more rows, Postgres still chooses a scan on the primary key instead of using the index.

Limit  (cost=0.43..296.63 rows=50 width=4) (actual time=1052.692..1052.737 rows=50 loops=1)
  ->  Index Scan using orders_test_pkey on orders_test  (cost=0.43..71149.43 rows=12010 width=4) (actual time=1052.690..1052.728 rows=50 loops=1)
        Filter: ((shipping_date >= '2022-04-30'::date) AND (shipping_date <= '2022-05-01'::date))
        Rows Removed by Filter: 1998734

By setting the CPU costs to 0 (last block in the fiddle) I can force the use of the previous plan and as I already suspected it is much better:

Limit  (cost=101.00..101.00 rows=50 width=4) (actual time=4.835..4.843 rows=50 loops=1)
  ->  Sort  (cost=101.00..101.00 rows=12010 width=4) (actual time=4.833..4.837 rows=50 loops=1)
        Sort Key: order_id
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Index Scan using orders_test_shipping_date_idx on orders_test  (cost=0.00..101.00 rows=12010 width=4) (actual time=0.026..3.339 rows=11266 loops=1)
              Index Cond: ((shipping_date >= '2022-04-30'::date) AND (shipping_date <= '2022-05-01'::date))

Is it overestimating the cost of the sorting?