> Note that in the OPs case I'd probably try testing things like turning
> off seqscan, or lowering random_page_cost. I'd also look at
> clustering on the index for the field you're selecting on.
I'm worried that turning off seqscan would distort other queries. (Remember, I don't have control of the individual queries. The user of the application can specify all sorts of queries, and there's an ORM in the middle.)
In order to force Postgres to do an index scan for this query, I had to set random_page_cost to 0. Even 0.1 was not small enough. Alternatively, I could set seq_page_cost to 39. (38 was not big enough.) Again, I'm worried that by using such a big hammer, I would distort Postgres's query planning for other queries.
|>ouglas
P.S. Here is the actual data that some people have been clamoring for:
Slow query:
psql> select * from maindb_astobject
where survey_id = 2
limit 20 offset 1000;
Time: 18073.691 ms
where survey_id = 2
limit 20 offset 1000;
Time: 18073.691 ms
Here's the explanation:
QUERY PLAN
-------------------------------------------------------------------------------------
Limit (cost=47.99..48.95 rows=20 width=153)
-> Seq Scan on maindb_astobject (cost=0.00..3538556.10 rows=73736478 width=153)
Filter: (survey_id = 2)
-------------------------------------------------------------------------------------
Limit (cost=47.99..48.95 rows=20 width=153)
-> Seq Scan on maindb_astobject (cost=0.00..3538556.10 rows=73736478 width=153)
Filter: (survey_id = 2)
The same query made fast by removing Postgres's ability to know a priori what particular value is being searched for:
psql> select * from maindb_astobject join maindb_enumentity
on maindb_astobject.survey_id = maindb_enumentity.id
where entityname = 'MACHO'
limit 20 offet 1000;
Time: 1.638 ms
on maindb_astobject.survey_id = maindb_enumentity.id
where entityname = 'MACHO'
limit 20 offet 1000;
Time: 1.638 ms
Here's the explanation for the above query showing that it is using the index:
QUERY PLAN
--------------------------------------------------------------------------
Limit (cost=164.97..168.27 rows=20 width=215)
-> Nested Loop (cost=0.00..1233523.72 rows=7477081 width=215)
-> Seq Scan on maindb_enumentity (cost=0.00..1.12 rows=1 width=62)
Filter: ((entityname)::text = 'MACHO'::text)
-> Index Scan using maindb_astobject_survey_id on maindb_astobject (cost=0.00..1046595.57 rows=14954162 width=153)
Index Cond: (maindb_astobject.survey_id = maindb_enumentity.id)
--------------------------------------------------------------------------
Limit (cost=164.97..168.27 rows=20 width=215)
-> Nested Loop (cost=0.00..1233523.72 rows=7477081 width=215)
-> Seq Scan on maindb_enumentity (cost=0.00..1.12 rows=1 width=62)
Filter: ((entityname)::text = 'MACHO'::text)
-> Index Scan using maindb_astobject_survey_id on maindb_astobject (cost=0.00..1046595.57 rows=14954162 width=153)
Index Cond: (maindb_astobject.survey_id = maindb_enumentity.id)
P.P.S. Many other queries were very slow due to Postgres wanting to use hash joins and merge joins rather than nested index scans. The nested index scans were, in fact, much faster. I just turned off hash joins and merge joins: problem solved. It might be nice at some point to figure out what is going with Postgres trying to use these fancier joins that turn out to be much slower for us, but that's a worry for another day, since I have a perfectly good work-around at the moment.