Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT

Alan Hodgson <ahodgson@xxxxxxxxxxxxxxx> · Mon, 06 Dec 2021 09:41:46 -0800

On Mon, 2021-12-06 at 10:18 -0700, Michael Lewis wrote:
> What is your default_statistics_target and how accurate is that
> estimate of 5668 rows? What is random_page_cost set to by the way?
> 
> 
> 

default_statistics_target = 1000
random_page_cost = 2.0 (it's on AWS on a 9000 iops gp2 volume)

Postgresql 13.5 btw.

The estimate was reasonably accurate, there were 5069 actual rows
matching.

> More importantly, what is the better plan that you'd like the planner
> to use with your existing indexes?

Well, it takes a few ms to grab all 5000 rows by shipment_import_id
and then sort/limit them. It takes 30 seconds to do what it is doing
instead, and only when the table is mostly cached already, more like
4-5 minutes otherwise.

   #explain analyze SELECT "shipment_import_records".* FROM 
   shipment_import_records" WHERE 
   shipment_import_records"."shipment_import_id" = 5090609 ORDER BY
   "shipment_import_records"."id" ASC LIMIT 1;
   ---------------------------------------------------------------------
   ---------------------------------------------------------------------
   -------------------------------------
    Limit  (cost=0.44..873.08 rows=1 width=243) (actual
   time=31689.725..31689.726 rows=1 loops=1)
      ->  Index Scan using shipment_import_records_pkey on
   shipment_import_records  (cost=0.44..5122405.71 rows=5870 width=243)
   (actual time=31689.723..31689.724 rows=1 loops=1)
            Filter: (shipment_import_id = 5090609)
            Rows Removed by Filter: 28710802
    Planning Time: 0.994 ms
    Execution Time: 31689.744 ms
   (6 rows)

Just with a kludge to force the better index:

   # explain analyze SELECT * FROM (SELECT "shipment_import_records".*
   FROM "shipment_import_records" WHERE
   "shipment_import_records"."shipment_import_id" = 5090609 OFFSET 0) AS
   x ORDER BY "id" ASC LIMIT 1;

   QUERY PLAN                                                          
   ---------------------------------------------------------------------
   ---------------------------------------------------------------------
   ------------------
    Limit  (cost=10655.34..10655.34 rows=1 width=243) (actual
   time=4.868..4.869 rows=1 loops=1)
      ->  Sort  (cost=10655.34..10670.02 rows=5870 width=243) (actual
   time=4.867..4.868 rows=1 loops=1)
            Sort Key: shipment_import_records.id
            Sort Method: top-N heapsort  Memory: 27kB
            ->  Index Scan using
   index_shipment_import_records_on_shipment_import_id on
   shipment_import_records  (cost=0.44..10567.29 rows=5870 width=243)
   (actual time=0.037..3.560 rows=5069 loops=1)
                  Index Cond: (shipment_import_id = 5090609)
    Planning Time: 0.135 ms
    Execution Time: 4.885 ms
   (8 rows)

> 
> Certainly a composite index would be very helpful here. Using explain
> analyze and sharing the output would give more info to go on.
> 

Yeah I am going to just do the composite index for now, but was 
hoping for a more generic option.

Thanks for looking at it.