Re: Different execution plan between PostgreSQL 8.4 and 12.11

David Rowley <dgrowleyml@xxxxxxxxx> · Wed, 12 Oct 2022 14:47:17 +1300

On Wed, 12 Oct 2022 at 13:06, Klint Gore <kgore4@xxxxxxxxxx> wrote:
> Limit  (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.040 rows=1 loops=1)
>   ->  Unique  (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.039 rows=1 loops=1)
>         ->  Index Only Scan using idx on tbl  (cost=0.56..28349.28 rows=995241 width=4) (actual time=0.038..0.038 rows=1 loops=1)
>               Index Cond: (fld = 230)
>               Heap Fetches: 0
> Planning Time: 0.066 ms
> Execution Time: 0.047 ms
>
> With the distinct and the limit, the planner somehow knows to push the either the distinct or the limit into the index only scan so the unique for distinct only had 1 row and the outer limit only had 1 row.  Without the limit, the distinct still does the index only scan but has to do the unique on the million rows and execution time goes to about 100ms.

I think that would be very simple to fix. I believe I've done that
locally but just detecting if needed_pathkeys == NULL in
create_final_distinct_paths().

i.e.

-                       if (pathkeys_contained_in(needed_pathkeys,
path->pathkeys))
+                       if (needed_pathkeys == NIL)
+                       {
+                               Node *limitCount = makeConst(INT8OID,
-1, InvalidOid,
+
                  sizeof(int64),
+
                  Int64GetDatum(1), false,
+
                  FLOAT8PASSBYVAL);
+                               add_path(distinct_rel, (Path *)
+
create_limit_path(root, distinct_rel, path, NULL,
+
            limitCount, LIMIT_OPTION_COUNT, 0,
+
            1));
+                       }
+                       else if
(pathkeys_contained_in(needed_pathkeys, path->pathkeys))

That just adds a Limit Path instead of the Unique Path. i.e:

postgres=# explain (analyze, costs off) select distinct a from t1 where a = 0;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Limit (actual time=0.074..0.075 rows=1 loops=1)
   ->  Index Only Scan using t1_a_idx on t1 (actual time=0.072..0.073
rows=1 loops=1)
         Index Cond: (a = 0)
         Heap Fetches: 1
 Planning Time: 0.146 ms
 Execution Time: 0.100 ms
(6 rows)

However, I might be wrong about that. I've not given it too much thought.

David