Hi, I'm running Postgresql 9.6.7 on Debian stretch -- and I'm trying to understand a query plan, with any hint where to gain further insight welcome. Situation: Two tables, both with a bigint source_id as primary key: gaia.dr2light -- roughly 1.6 billion rows, about 30 columns, all of them reals, double precisions, bigints. gaia.dr2epochflux -- roughly 0.5 million rows, about 15 columns, 10 of which are arrays, typically real[] or double precision[] with 10 .. 30 array elements each. The source_ids in dr2epochflux are a small subset of the ones in dr2light. Also, dr2light has an index on a column called parallax, and select count(*) from gaia.dr2light where parallax>50; gives 5400 rows in no time. The distribution is such that this will actually touch 5400 disk blocks on gaia.dr2light. Both tables are read only (i.e., have been ingested and analyzed once and remained unchanged ever since). The story: SELECT * FROM gaia.dr2epochflux JOIN gaia.dr2light USING (source_id) WHERE parallax>50 (query 1) is horribly slow (like, many minutes). Forcing the planner to do the right thing, like this: SELECT * FROM gaia.dr2epochflux JOIN (SELECT * FROM gaia.dr2light WHERE parallax>50 OFFSET 0) AS q USING (source_id) (query 2) makes the query execute in about a second (and yields 18 rows). Why would Postgres choose a dramatically inferior plan? Here's what EXPLAIN gives for query 2 (the good plan): -------------------------------------------------------------------------------------------------------- Hash Join (cost=472743.04..26313612.35 rows=551038 width=1647) Hash Cond: (q.source_id = dr2epochflux.source_id) -> Subquery Scan on q (cost=243173.69..25417932.01 rows=12991627 width=132) -> Bitmap Heap Scan on dr2light (cost=243173.69..25288015.74 rows=12991627 width=132) Recheck Cond: (parallax > '50'::double precision) -> Bitmap Index Scan on dr2light_parallax (cost=0.00..239925.78 rows=12991627 width=0) Index Cond: (parallax > '50'::double precision) -> Hash (cost=118285.38..118285.38 rows=551038 width=1523) -> Seq Scan on dr2epochflux (cost=0.00..118285.38 rows=551038 width=1523) And here's the bad plan (for query 1): -------------------------------------------------------------------------------------- Nested Loop (cost=0.58..4801856.96 rows=4229 width=1647) -> Seq Scan on dr2epochflux (cost=0.00..118285.38 rows=551038 width=1523) -> Index Scan using dr2light_pkey on dr2light (cost=0.58..8.49 rows=1 width=132) Index Cond: (source_id = dr2epochflux.source_id) Filter: (parallax > '50'::double precision) If I enable_seqscan=0, it comes up with this for query 1: --------------------------------------------------------------------------------------------------------- Nested Loop (cost=1.00..4810726.18 rows=4229 width=1647) -> Index Scan using dr2epochflux_pkey on dr2epochflux (cost=0.42..127154.60 rows=551038 width=1523) -> Index Scan using dr2light_pkey on dr2light (cost=0.58..8.49 rows=1 width=132) Index Cond: (source_id = dr2epochflux.source_id) Filter: (parallax > '50'::double precision) -- which in reality appears to be a good deal faster than the "bad" plan, though still much, much slower than the "good plan". Both tables are ANALYZE-d, and they should be reasonably VACUUMED. Is there anything I can do to make it easier for the planner to see the light? -- Markus