Hi Forum,
I'm scratching my head around the following case:
te is a 80M rows, 100GB table. It is a bare simple select over indexed attribute of it.
EXPLAIN SELECT te.id FROM te WHERE te.current_pid IN (240900026,
240900027,
240900028,
-- 200 entries ...
Gather (cost=1000.00..61517367.85 rows=3870 width=8)
Workers Planned: 2
-> Parallel Seq Scan on te (cost=0.00..61515980.85 rows=1612 width=8)
Filter: (current_pid = ANY ('{240900026,240900027,...240901129}'::bigint[]))
Execution time is about 5 minutes
Reducing number of current_pids to 100 changes the plan and it does index scan. (101 still does seq scan)
Index Scan using te_current_pid_idx on te (cost=0.57..731.26 rows=3832 width=8) (actual time=0.566..1.667 rows=600 loops=1)
Index Cond: (current_pid = ANY ('{240900026,240900027,...240900194}'::bigint[]))
Planning Time: 3.152 ms
Execution Time: 1.732 ms
Selecting 200 pids rewritten with CTE goes for index too.
EXPLAIN ANALYZE
WITH cte as (
select * from unnest(ARRAY[
240900026,
240900027,
240900028,
...
240901129
]))
SELECT te.id FROM te join cte on te.current_pid = cte.unnest;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1.58..1097.83 rows=3847 width=8) (actual time=0.882..14.927 rows=1468 loops=1)
CTE cte
-> Function Scan on unnest (cost=0.00..1.00 rows=100 width=4) (actual time=0.025..0.043 rows=205 loops=1)
-> CTE Scan on cte (cost=0.00..2.00 rows=100 width=4) (actual time=0.027..0.083 rows=205 loops=1)
-> Index Scan using te_current_pid_idx on te (cost=0.57..10.57 rows=38 width=16) (actual time=0.011..0.071 rows=7 loops=205)
Index Cond: (current_pid = cte.unnest)
Planning Time: 2.022 ms
Execution Time: 15.044 ms
I tried random_page_cost=1, a couple of combinations with very low
cpu_index_tuple_cost and cpu_operator_cost. Only managed to get an index scan for a few more IN entries.
EXPLAIN SELECT te.id FROM te WHERE te.current_pid IN (240900026,
240900027,
240900028,
-- 200 entries ...
Gather (cost=1000.00..61517367.85 rows=3870 width=8)
Workers Planned: 2
-> Parallel Seq Scan on te (cost=0.00..61515980.85 rows=1612 width=8)
Filter: (current_pid = ANY ('{240900026,240900027,...240901129}'::bigint[]))
Execution time is about 5 minutes
Reducing number of current_pids to 100 changes the plan and it does index scan. (101 still does seq scan)
Index Scan using te_current_pid_idx on te (cost=0.57..731.26 rows=3832 width=8) (actual time=0.566..1.667 rows=600 loops=1)
Index Cond: (current_pid = ANY ('{240900026,240900027,...240900194}'::bigint[]))
Planning Time: 3.152 ms
Execution Time: 1.732 ms
Selecting 200 pids rewritten with CTE goes for index too.
EXPLAIN ANALYZE
WITH cte as (
select * from unnest(ARRAY[
240900026,
240900027,
240900028,
...
240901129
]))
SELECT te.id FROM te join cte on te.current_pid = cte.unnest;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1.58..1097.83 rows=3847 width=8) (actual time=0.882..14.927 rows=1468 loops=1)
CTE cte
-> Function Scan on unnest (cost=0.00..1.00 rows=100 width=4) (actual time=0.025..0.043 rows=205 loops=1)
-> CTE Scan on cte (cost=0.00..2.00 rows=100 width=4) (actual time=0.027..0.083 rows=205 loops=1)
-> Index Scan using te_current_pid_idx on te (cost=0.57..10.57 rows=38 width=16) (actual time=0.011..0.071 rows=7 loops=205)
Index Cond: (current_pid = cte.unnest)
Planning Time: 2.022 ms
Execution Time: 15.044 ms
I tried random_page_cost=1, a couple of combinations with very low
cpu_index_tuple_cost and cpu_operator_cost. Only managed to get an index scan for a few more IN entries.
Did analyze. Bumped stats target for current_pid to 5000. Did not help.
I'm out of ideas. What is the right approach to solve this ?
Thank You!
Rado