Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )

David Rowley <dgrowleyml@xxxxxxxxx> · Fri, 8 May 2020 11:46:00 +1200

On Fri, 8 May 2020 at 10:00, David G. Johnston
<david.g.johnston@xxxxxxxxx> wrote:
>
> On Thu, May 7, 2020 at 11:07 AM Amarendra Konda <amar.vijaya@xxxxxxxxx> wrote:
>>
>> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)  SELECT pa.process_activity_id AS pa_process_activity_id  FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00'  AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.process_instance_id = pa.process_instance_id  AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,  pa.created limit 50;
>>
>>
>>                ->  Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa  (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2)
>
> > Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
>
> I suppose during the nested loop the inner index scan could limit itself to the first 50 entries it finds (since the first two index columns are being held constant on each scan, m.created should define the traversal order...) so that the output of the nested loop ends up being (max 2 x 50) 100 entries which are then sorted and only the top 50 returned.
>
> Whether the executor could but isn't doing that here or isn't programmed to do that (or my logic is totally off) I do not know.

I think the planner is likely not putting the process_activity table
on the outer side of the nested loop join due to the poor row
estimates.  If it knew that so many rows would match the join then it
likely would have done that to save from having to perform the sort at
all.  However, because the planner has put the process_instance on the
outer side of the nested loop join, it's the pathkeys from that path
that the nested loop node has, which is not the same as what the ORDER
BY needs, so the planner must add a sort step, which means that all
rows from the nested loop plan must be read so that they can be
sorted.

It might be worth trying: create index on process_instance
(user_id,app_id); as that might lower the cost of performing the join
in the opposite order and have the planner prefer that order instead.
If doing that, the OP could then ditch the
fki_conv_konotor_user_user_id index to save space.

If that's not enough to convince the planner that the opposite order
is better then certainly SET enable_sort TO off; would.

David