Tom, We have continued to explore the issue and one of my teammates, copied, has made some interesting additional discoveries. I apparently glossed over a subtle distinction about the query being issued. My original reported query structure was of the form... SELECT * FROM events WHERE entity_type_id = XXX ORDER BY published_at DESC LIMIT 25; ... but in reality it was more like... SELECT * FROM events WHERE entity_type_id = (SELECT id FROM entity_types WHERE name = ?) ORDER BY published_at DESC LIMIT 25; ... and these queries in fact yield dramatically different cost analyses, so much so that if you switch to using the former one by virtue of doing a query yourself for the entity_type_id instead of using a subquery, then the system uses the two-part index as hoped. I suspect that this stems in part from the non-even distribution of the entity_type_id value in the events table for which there are 20-30 values but two or three of them account for a very large share of the table (and Postgres only seems to track the fraction taken by each of the top ten or so values). For the query that originated my consternation, I presume the planner said "I don't know which value of entity_type_id the subquery will yield, so I'll assume an average density based on everything I've seen in the table" (which really probably means only the top ten values it has seen), whereas when we hard-code the entity_type_id by doing the sub-query ourselves beforehand the query planner says "that value must be either really rare or non-existent because I haven't even seen it since my last ANALYZE of the table and this table is huge". Maybe this is an inherent limitation of the query planner because it does not want to explore parts of the plan by actually executing subqueries so that it can make more informed choices about the larger query? We restored a back-up of the system onto another machine, ran the conversion to Postgres 9, cranked up the stats collection configurations all the way, ran ANALYZE, and still got the same results, which leads me to believe that there is an issue with the query planner regarding its ability to do statistical analysis pertaining to columns in a WHERE clause being specified by a sub-query (our entity_types table is extremely small, and presumably thus always in memory, thus a subquery would be insanely cheap, but I appreciate that we're way down in the weeds of query planning by this point, and that there may be fundamental problems with issuing actual queries so as to do exploratory query planning). We (Scott, really) continued to explore this (using the original query, not the tweaked one) by doing a mix of alternately dropping indexes, tuning execution cost configuration parameters, and clearing the OS cache between queries. One of the outcomes from this was the realization that random_page_cost is the dominant factor for the query plan involving the two-part index, such that when we slash it from the default 4 to specifying 2 that it slashes the cost almost exactly in half for using the two-part index and causes it to be used even though the query planner is over-estimating the prevalence of the column value due (presumably) to not knowing how the subquery was going to play out. This brings me back to my musings about Postgres b-tree index implementation... Why should using a two-part index with the WHERE clause fixing the first column's value yield a query with more random I/O than walking the single column index and filtering out the non-matching rows? Given my understanding of index implementation, it seems like using the two-part index in even the degenerate case of a table with only one entity_type_id would yield almost exactly the same I/O load as using the one-part index, and so a statistical distribution of the table that was at all better than that degenerate case would cause selection of the two-part index. This makes me think that either this illustrates a second query planner issue or that my understanding of the implementation of b-tree indexes in Postgres is flawed. It seems obvious to me that we need to tweak the cost configuration parameters in our Postgres installation, at the least lowering random_page_cost to something more in-line with the hardware we have, but even that that feels like we would just be skirting issues with the query planner when either there is a subtle flaw in the planner or a major flaw in my understanding of b-tree index implementation. Mind you, I raise these issues as someone who profoundly loves Postgres, though perhaps is loving it too hard these days. I would really like to get a fuller understanding of what is happening here so as to craft a permanent solution. I am worried that even if we tweak one or more of the cost configuration parameters that it might still be prudent to issue the subquery's look-up prior to the main query and then embed its results so that the query planner can act with better knowledge of the specified entity_type_id value's prevalence in the events table, even though this would feel a little bit like a hack. Any insights would be greatly appreciate. -- AWG On Tue, Apr 08, 2014 at 09:55:38AM -0400, Tom Lane wrote: > "Andrew W. Gibbs" <awgibbs@xxxxxxxxxxx> writes: > > A very common query against this table is of the form... > > > SELECT * FROM events WHERE entity_type_id = XXX ORDER BY published_at DESC LIMIT 25; > > > ... to get the most recent 25 events from the table for a given type > > of entity, and generally the query planner does the expected thing of > > using the two-part index on (entity_type_id, published_at). Every now > > and again, though, I have found the query planner deciding that it > > ought use the single column (published_at) index. > > What is the estimated rows count according to EXPLAIN when it does that, > versus when it chooses the better plan? > > > FLAW, we're running on 8.4.X and using the out-of-the-box > > default_statistics_target setting and haven't dabbled with setting > > table level statistics configurations. > > 8.4.X is due to reach SOL in July, so you really ought to be thinking > about an upgrade. It's not clear from the given info whether this issue > is fixable with stats configuration adjustments, is a bug already fixed > in later versions, or neither, but we're unlikely to make any significant > changes in the 8.4 planner code at this point... > > regards, tom lane > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance