Hi,
I found very weird behaviour on planner side with estimation error of 700.000.000.
Situation (with explain analyze):
EXPLAIN ANALYZE
select * from person2obj
WHERE
p2o_id IN (SELECT p2o_id::bigint FROM (SELECT * FROM (SELECT column1 AS p2o_id FROM (
VALUES ('2056892'), up to 199 values total
) AS __CDP_VALUES__) AS __CDP_DATA__) AS __TARGET__ );
;
QUERY PLANEXPLAIN ANALYZE
select * from person2obj
WHERE
p2o_id IN (SELECT p2o_id::bigint FROM (SELECT * FROM (SELECT column1 AS p2o_id FROM (
VALUES ('2056892'), up to 199 values total
) AS __CDP_VALUES__) AS __CDP_DATA__) AS __TARGET__ );
;
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=2.99..16316.34 rows=199 width=58) (actual time=0.196..1.202 rows=198 loops=1)
-> HashAggregate (cost=2.99..4.98 rows=199 width=32) (actual time=0.160..0.205 rows=199 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..2.49 rows=199 width=32) (actual time=0.003..0.088 rows=199 loops=1)
-> Index Scan using pk_person2obj on person2obj (cost=0.00..81.96 rows=1 width=58) (actual time=0.004..0.004 rows=1 loops=199)
Index Cond: (p2o_id = ("*VALUES*".column1)::bigint)
Estimate looks pretty reasonable.
However, with length of the value list 200 (or more), the database switch to completely different (and very weird) estimation of 700.000.000:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=3.00..16398.33 rows=714143698 width=58) (actual time=0.200..1.239 rows=200 loops=1)
-> HashAggregate (cost=3.00..5.00 rows=200 width=32) (actual time=0.165..0.201 rows=200 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..2.50 rows=200 width=32) (actual time=0.004..0.090 rows=200 loops=1)
-> Index Scan using pk_person2obj on person2obj (cost=0.00..81.96 rows=1 width=58) (actual time=0.004..0.004 rows=1 loops=200)
Index Cond: (p2o_id = ("*VALUES*".column1)::bigint)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=3.00..16398.33 rows=714143698 width=58) (actual time=0.200..1.239 rows=200 loops=1)
-> HashAggregate (cost=3.00..5.00 rows=200 width=32) (actual time=0.165..0.201 rows=200 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..2.50 rows=200 width=32) (actual time=0.004..0.090 rows=200 loops=1)
-> Index Scan using pk_person2obj on person2obj (cost=0.00..81.96 rows=1 width=58) (actual time=0.004..0.004 rows=1 loops=200)
Index Cond: (p2o_id = ("*VALUES*".column1)::bigint)
The all estimates looks ok until the final nested loop plan estimate of 700.000.000
PS: the person2obj table contains ~1.4 billion tuples, p2o_id - primary key.
--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/
Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@xxxxxxxxx
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."
Senior Postgresql DBA
http://www.postgresql-consulting.ru/
Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@xxxxxxxxx
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."