I was almost through writing a bug report when I figured out what I was doing wrong, so I'll post it here in the hope that it prevents someone from tearing their hair out. I was trying to use a loose index scan as described on https://wiki.postgresql.org/wiki/Loose_indexscan on a column with just 3 distinct values in a table with about 20 million rows. To my surprise it was not faster than a simple “select distinct”, which was doing a sequential scan. Here is the plan: wds=> explain analyze WITH RECURSIVE t AS ( SELECT MIN(periodizitaet) AS periodizitaet FROM facttable_imf_ifs UNION ALL SELECT (SELECT MIN(periodizitaet) FROM facttable_imf_ifs WHERE periodizitaet > t.periodizitaet) FROM t WHERE t.periodizitaet IS NOT NULL ) SELECT periodizitaet FROM t WHERE periodizitaet IS NOT NULL ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ CTE Scan on t (cost=993270.41..993272.43 rows=100 width=32) (actual time=8073.953..8074.042 rows=3 loops=1) Filter: (periodizitaet IS NOT NULL) Rows Removed by Filter: 1 CTE t -> Recursive Union (cost=993206.56..993270.41 rows=101 width=32) (actual time=8073.952..8074.039 rows=4 loops=1) -> Aggregate (cost=993206.56..993206.57 rows=1 width=2) (actual time=8073.948..8073.948 rows=1 loops=1) -> Seq Scan on facttable_imf_ifs facttable_imf_ifs_1 (cost=0.00..940457.05 rows=21099805 width=2) (actual time=0.258..3567.893 rows=20649561 loops=1) -> WorkTable Scan on t t_1 (cost=0.00..6.18 rows=10 width=32) (actual time=0.021..0.021 rows=1 loops=4) Filter: (periodizitaet IS NOT NULL) Rows Removed by Filter: 0 SubPlan 2 -> Result (cost=0.59..0.60 rows=1 width=0) (actual time=0.025..0.025 rows=1 loops=3) InitPlan 1 (returns $2) -> Limit (cost=0.44..0.59 rows=1 width=2) (actual time=0.024..0.024 rows=1 loops=3) -> Index Only Scan using facttable_imf_ifs_periodizitaet_idx on facttable_imf_ifs (cost=0.44..1059854.76 rows=7033268 width=2) (actual time=0.022..0.022 rows=1 loops=3) Index Cond: ((periodizitaet IS NOT NULL) AND (periodizitaet > t_1.periodizitaet)) Heap Fetches: 2 Planning time: 9.261 ms Execution time: 8074.219 ms (19 rows) So it's doing a sequential scan on the initial select in the recursive CTE, but using the index on the subsequent selects. But why? If it uses the index on SELECT MIN(periodizitaet) FROM facttable_imf_ifs WHERE periodizitaet > 'x' shouldn't it be able to use the same index on SELECT MIN(periodizitaet) FROM facttable_imf_ifs ? I was playing around a bit with different values and not getting anywhere, until it hit me: facttable_imf_ifs_periodizitaet_idx is a partial index with a “WHERE periodizitaet IS NOT NULL” clause. This is fine for a “WHERE periodizitaet > 'x'” query, because that implies NOT NULL, but a straight unadorned “SELECT MIN(periodizitaet)” could return NULL, so it can't use the index. Add “where periodizitaet is not null” to the initial query (I'm not interested in null values, that index is partial for a reason) and all is well: wds=> explain analyze WITH RECURSIVE t AS ( SELECT MIN(periodizitaet) AS periodizitaet FROM facttable_imf_ifs where periodizitaet is not null UNION ALL SELECT (SELECT MIN(periodizitaet) FROM facttable_imf_ifs WHERE periodizitaet > t.periodizitaet) FROM t WHERE t.periodizitaet IS NOT NULL ) SELECT periodizitaet FROM t WHERE periodizitaet IS NOT NULL ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- CTE Scan on t (cost=64.40..66.42 rows=100 width=32) (actual time=0.046..0.704 rows=3 loops=1) Filter: (periodizitaet IS NOT NULL) Rows Removed by Filter: 1 CTE t -> Recursive Union (cost=0.51..64.40 rows=101 width=32) (actual time=0.043..0.698 rows=4 loops=1) -> Result (cost=0.51..0.52 rows=1 width=0) (actual time=0.042..0.042 rows=1 loops=1) InitPlan 1 (returns $1) -> Limit (cost=0.44..0.51 rows=1 width=2) (actual time=0.039..0.040 rows=1 loops=1) -> Index Only Scan using facttable_imf_ifs_periodizitaet_idx on facttable_imf_ifs (cost=0.44..1516760.47 rows=21080284 width=2) (actual time=0.038..0.038 rows=1 loops=1) Index Cond: ((periodizitaet IS NOT NULL) AND (periodizitaet IS NOT NULL)) Heap Fetches: 1 -> WorkTable Scan on t t_1 (cost=0.00..6.19 rows=10 width=32) (actual time=0.161..0.162 rows=1 loops=4) Filter: (periodizitaet IS NOT NULL) Rows Removed by Filter: 0 SubPlan 3 -> Result (cost=0.59..0.60 rows=1 width=0) (actual time=0.212..0.212 rows=1 loops=3) InitPlan 2 (returns $3) -> Limit (cost=0.44..0.59 rows=1 width=2) (actual time=0.211..0.211 rows=1 loops=3) -> Index Only Scan using facttable_imf_ifs_periodizitaet_idx on facttable_imf_ifs facttable_imf_ifs_1 (cost=0.44..1061729.65 rows=7026761 width=2) (actual time=0.208..0.208 rows=1 loops=3) Index Cond: ((periodizitaet IS NOT NULL) AND (periodizitaet > t_1.periodizitaet)) Heap Fetches: 2 Planning time: 8.883 ms Execution time: 0.801 ms (23 rows) 800 times faster :-). hp -- _ | Peter J. Holzer | A coding theorist is someone who doesn't |_|_) | | think Alice is crazy. | | | hjp@xxxxxx | -- John Gordon __/ | http://www.hjp.at/ | http://downlode.org/Etext/alicebob.html
Attachment:
signature.asc
Description: Digital signature