Hi,
PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
We have noticed huge difference interms of execution plan ( response time) , When we pass the direct values Vs inner query to IN clause.
High level details of the use case are as follows
- As part of the SQL there are 2 tables named Process_instance (master) and Process_activity ( child)
- Wanted to fetch TOP 50 rows from Process_activity table for the given values of the Process_instance.
- When we used Inner Join / Inner query ( query1) between parent table and child table , LIMIT is not really taking in to account. Instead it is fetching more rows and columns that required, and finally limiting the result
Query1
web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id FROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (SELECT pi.process_instance_id FROM process_instance pi WHERE pi.user_id = '317079413683604' AND pi.app_id = '427380312000560') ORDER BY pa.process_instance_id,pa.created limit 50;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1071.47..1071.55 rows=31 width=24) (actual time=85.958..85.991 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43065
-> Sort (cost=1071.47..1071.55 rows=31 width=24) (actual time=85.956..85.971 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43065
-> Nested Loop (cost=1.14..1070.70 rows=31 width=24) (actual time=0.031..72.183 rows=46992 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43065
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.010..0.013 rows=2 loops=1)
Output: pi.process_instance_id
Index Cond: (pi.user_id = '317079413683604'::bigint)
Filter: (pi.app_id = '427380312000560'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1053.80 rows=1425 width=24) (actual time=0.015..20.702 rows=23496 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.market
ing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translate, pa.in_reply_to
Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43060
Planning time: 0.499 ms
Execution time: 86.040 ms
(22 rows)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1071.47..1071.55 rows=31 width=24) (actual time=85.958..85.991 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43065
-> Sort (cost=1071.47..1071.55 rows=31 width=24) (actual time=85.956..85.971 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43065
-> Nested Loop (cost=1.14..1070.70 rows=31 width=24) (actual time=0.031..72.183 rows=46992 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43065
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.010..0.013 rows=2 loops=1)
Output: pi.process_instance_id
Index Cond: (pi.user_id = '317079413683604'::bigint)
Filter: (pi.app_id = '427380312000560'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1053.80 rows=1425 width=24) (actual time=0.015..20.702 rows=23496 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.market
ing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translate, pa.in_reply_to
Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43060
Planning time: 0.499 ms
Execution time: 86.040 ms
(22 rows)
Query 2
web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS m_process_activity_id FROM process_activity m WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (240117466018927,325820556706970,433008275197305) ORDER BY pa.process_instance_id,pa.created limit 50;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.70..37.66 rows=50 width=24) (actual time=0.023..0.094 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Buffers: shared hit=50
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..3124.97 rows=4226 width=24) (actual time=0.022..0.079 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Index Cond: ((pa.process_instance_id = ANY ('{140117466018927,225820556706970,233008275197305}'::bigint[])) AND (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=50
Planning time: 0.167 ms
Execution time: 0.137 ms
(9 rows)
web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS m_process_activity_id FROM process_activity m WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (240117466018927,325820556706970,433008275197305) ORDER BY pa.process_instance_id,pa.created limit 50;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.70..37.66 rows=50 width=24) (actual time=0.023..0.094 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Buffers: shared hit=50
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..3124.97 rows=4226 width=24) (actual time=0.022..0.079 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Index Cond: ((pa.process_instance_id = ANY ('{140117466018927,225820556706970,233008275197305}'::bigint[])) AND (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=50
Planning time: 0.167 ms
Execution time: 0.137 ms
(9 rows)
Can someone explain
- Why It is fetching more columns and more rows, incase of inner query ?
- Is there any option to really limit values with INNER JOIN, INNER query ? If yes, can you please share information on this ?
Thanks in advance for your time and suggestions.
Regards, Amar