Re: Bad plan for ltree predicate <@

Roman Konoval <rkonoval@xxxxxxxxx> · Sat, 2 Dec 2017 03:34:43 +0100

Hi Tom,

Thanks for your help.

On Dec 1, 2017, at 22:33, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

 The seqscan formulation of the query results in evaluating
this function afresh at most of the rows

The function is defined as STABLE. I though that means that there is no need
 to reevaluate it on every row as input parameter is the same for every row and
 return value will be the same during the same query execution. Do I understand
 incorrectly what STABLE means?
Why is the function evaluated more than once?

, whereas shoving it into an
uncorrelated sub-select causes it to be evaluated only once.  That, I
think, and not the seqscan-vs-indexscan aspect, is what makes the bitmap
formulation go faster.  Certainly you'd not expect that a bitmap scan that
has to hit most of the rows anyway is going to win over a seqscan.

The fact that the planner goes for a bitmap scan in the second formulation
is an artifact of the fact that it doesn't try to pre-evaluate sub-selects
for selectivity estimation purposes, so you end up with a default estimate
that says that the <@ condition only selects a small fraction of the rows.
Not sure if we should try to change that or not.

I'd suggest setting the function's cost to 1000 or so and seeing if that
doesn't improve matters.

If I set function cost to 1000 I get slightly better plan but still 3.5 more buffers are read when compared to bitmap scan which as you wrote one would expect to be slower than seq scan.Here is the plan:

                                                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=216438.81..216438.82 rows=1 width=0) (actual time=1262.244..1262.245 rows=1 loops=1)
   Buffers: shared hit=169215
   CTE trees
     ->  Index Scan using document_head__id_path__gist__idx on document_head d  (cost=2.91..212787.85 rows=162265 width=74) (actual time=0.115..727.119 rows=154854 loops=1)
           Index Cond: (id_path <@ get_doc_path('78157c60-45bc-42c1-9aad-c5651995db5c'::character varying))
           Filter: (((id)::text <> '78157c60-45bc-42c1-9aad-c5651995db5c'::text) AND ((state)::text <> 'DELETED'::text))
           Rows Removed by Filter: 23
           Buffers: shared hit=169215
   ->  CTE Scan on trees  (cost=0.00..3245.30 rows=162265 width=0) (actual time=0.119..1118.899 rows=154854 loops=1)
         Buffers: shared hit=169215
 Total runtime: 1277.010 ms
(11 rows)

My understanding is that the optimal plan in this case should read less data than bitmap scan by the amount of buffers hit by bitmap index scan. 
It should read roughly all buffers of the table itself. Something like the query with predicate using ltree literal instead of function invocation:

explain (analyze, buffers)
with trees AS (
SELECT d.id, d.snapshot_id , NULL :: text[] AS permissions
  FROM document_head AS d
  WHERE (d.id_path <@ '869c0187_51ae_4deb_a36f_0425fdafda6e.78157c60_45bc_42c1_9aad_c5651995db5c'::ltree AND d.id != '78157c60-45bc-42c1-9aad-c5651995db5c') AND d.state != 'DELETED'
)
SELECT COUNT(*) FROM trees;
                                                                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=42114.02..42114.03 rows=1 width=0) (actual time=997.427..997.427 rows=1 loops=1)
   Buffers: shared hit=35230
   CTE trees
     ->  Seq Scan on document_head d  (cost=0.00..38463.06 rows=162265 width=74) (actual time=0.013..593.082 rows=154854 loops=1)
           Filter: ((id_path <@ '869c0187_51ae_4deb_a36f_0425fdafda6e.78157c60_45bc_42c1_9aad_c5651995db5c'::ltree) AND ((id)::text <> '78157c60-45bc-42c1-9aad-c5651995db5c'::text) AND ((state)::text <> 'DELETED'::text))
           Rows Removed by Filter: 23357
           Buffers: shared hit=35230
   ->  CTE Scan on trees  (cost=0.00..3245.30 rows=162265 width=0) (actual time=0.017..888.076 rows=154854 loops=1)
         Buffers: shared hit=35230
 Total runtime: 1011.565 ms
(10 rows)

The question is if it possible to get plan like that using function or some other way to get ltree value for given document_head.id value in one query?

As an alternative I can get ltree value with the separate query but this would require
1. a round-trip to postgres
2. me to change isolation level to REPEATABLE READ to make sure that I get consistent result 
so I would like to avoid that.

Regards,
Roman Konoval