On Fri, 8 May 2020 at 10:00, David G. Johnston <david.g.johnston@xxxxxxxxx> wrote: > > On Thu, May 7, 2020 at 11:07 AM Amarendra Konda <amar.vijaya@xxxxxxxxx> wrote: >> >> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.process_instance_id = pa.process_instance_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id, pa.created limit 50; >> >> >> -> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2) > > > Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone)) > > I suppose during the nested loop the inner index scan could limit itself to the first 50 entries it finds (since the first two index columns are being held constant on each scan, m.created should define the traversal order...) so that the output of the nested loop ends up being (max 2 x 50) 100 entries which are then sorted and only the top 50 returned. > > Whether the executor could but isn't doing that here or isn't programmed to do that (or my logic is totally off) I do not know. I think the planner is likely not putting the process_activity table on the outer side of the nested loop join due to the poor row estimates. If it knew that so many rows would match the join then it likely would have done that to save from having to perform the sort at all. However, because the planner has put the process_instance on the outer side of the nested loop join, it's the pathkeys from that path that the nested loop node has, which is not the same as what the ORDER BY needs, so the planner must add a sort step, which means that all rows from the nested loop plan must be read so that they can be sorted. It might be worth trying: create index on process_instance (user_id,app_id); as that might lower the cost of performing the join in the opposite order and have the planner prefer that order instead. If doing that, the OP could then ditch the fki_conv_konotor_user_user_id index to save space. If that's not enough to convince the planner that the opposite order is better then certainly SET enable_sort TO off; would. David