Re: query against large table not using sensible index to find very small amount of data

"Andrew W. Gibbs" <awgibbs@xxxxxxxxxxx> · Sat, 12 Apr 2014 20:12:18 -0400

Tom,

We have continued to explore the issue and one of my teammates,
copied, has made some interesting additional discoveries.

I apparently glossed over a subtle distinction about the query being
issued.  My original reported query structure was of the form...

SELECT * FROM events WHERE entity_type_id = XXX ORDER BY published_at DESC LIMIT 25;

... but in reality it was more like...

SELECT * FROM events WHERE entity_type_id = (SELECT id FROM entity_types WHERE name = ?) ORDER BY published_at DESC LIMIT 25;

... and these queries in fact yield dramatically different cost
analyses, so much so that if you switch to using the former one by
virtue of doing a query yourself for the entity_type_id instead of
using a subquery, then the system uses the two-part index as hoped.

I suspect that this stems in part from the non-even distribution of
the entity_type_id value in the events table for which there are 20-30
values but two or three of them account for a very large share of the
table (and Postgres only seems to track the fraction taken by each of
the top ten or so values).  For the query that originated my
consternation, I presume the planner said "I don't know which value of
entity_type_id the subquery will yield, so I'll assume an average
density based on everything I've seen in the table" (which really
probably means only the top ten values it has seen), whereas when we
hard-code the entity_type_id by doing the sub-query ourselves
beforehand the query planner says "that value must be either really
rare or non-existent because I haven't even seen it since my last
ANALYZE of the table and this table is huge".

Maybe this is an inherent limitation of the query planner because it
does not want to explore parts of the plan by actually executing
subqueries so that it can make more informed choices about the larger
query?  We restored a back-up of the system onto another machine, ran
the conversion to Postgres 9, cranked up the stats collection
configurations all the way, ran ANALYZE, and still got the same
results, which leads me to believe that there is an issue with the
query planner regarding its ability to do statistical analysis
pertaining to columns in a WHERE clause being specified by a sub-query
(our entity_types table is extremely small, and presumably thus always
in memory, thus a subquery would be insanely cheap, but I appreciate
that we're way down in the weeds of query planning by this point, and
that there may be fundamental problems with issuing actual queries so
as to do exploratory query planning).

We (Scott, really) continued to explore this (using the original
query, not the tweaked one) by doing a mix of alternately dropping
indexes, tuning execution cost configuration parameters, and clearing
the OS cache between queries.  One of the outcomes from this was the
realization that random_page_cost is the dominant factor for the query
plan involving the two-part index, such that when we slash it from the
default 4 to specifying 2 that it slashes the cost almost exactly in
half for using the two-part index and causes it to be used even though
the query planner is over-estimating the prevalence of the column
value due (presumably) to not knowing how the subquery was going to
play out.

This brings me back to my musings about Postgres b-tree index
implementation...  Why should using a two-part index with the WHERE
clause fixing the first column's value yield a query with more random
I/O than walking the single column index and filtering out the
non-matching rows?  Given my understanding of index implementation, it
seems like using the two-part index in even the degenerate case of a
table with only one entity_type_id would yield almost exactly the same
I/O load as using the one-part index, and so a statistical
distribution of the table that was at all better than that degenerate
case would cause selection of the two-part index.  This makes me think
that either this illustrates a second query planner issue or that my
understanding of the implementation of b-tree indexes in Postgres is
flawed.

It seems obvious to me that we need to tweak the cost configuration
parameters in our Postgres installation, at the least lowering
random_page_cost to something more in-line with the hardware we have,
but even that that feels like we would just be skirting issues with
the query planner when either there is a subtle flaw in the planner or
a major flaw in my understanding of b-tree index implementation.

Mind you, I raise these issues as someone who profoundly loves
Postgres, though perhaps is loving it too hard these days.  I would
really like to get a fuller understanding of what is happening here so
as to craft a permanent solution.  I am worried that even if we tweak
one or more of the cost configuration parameters that it might still
be prudent to issue the subquery's look-up prior to the main query and
then embed its results so that the query planner can act with better
knowledge of the specified entity_type_id value's prevalence in the
events table, even though this would feel a little bit like a hack.

Any insights would be greatly appreciate.

  -- AWG

On Tue, Apr 08, 2014 at 09:55:38AM -0400, Tom Lane wrote:
> "Andrew W. Gibbs" <awgibbs@xxxxxxxxxxx> writes:
> > A very common query against this table is of the form...
> 
> > SELECT * FROM events WHERE entity_type_id = XXX ORDER BY published_at DESC LIMIT 25;
> 
> > ... to get the most recent 25 events from the table for a given type
> > of entity, and generally the query planner does the expected thing of
> > using the two-part index on (entity_type_id, published_at).  Every now
> > and again, though, I have found the query planner deciding that it
> > ought use the single column (published_at) index.
> 
> What is the estimated rows count according to EXPLAIN when it does that,
> versus when it chooses the better plan?
> 
> > FLAW, we're running on 8.4.X and using the out-of-the-box
> > default_statistics_target setting and haven't dabbled with setting
> > table level statistics configurations.
> 
> 8.4.X is due to reach SOL in July, so you really ought to be thinking
> about an upgrade.  It's not clear from the given info whether this issue
> is fixable with stats configuration adjustments, is a bug already fixed
> in later versions, or neither, but we're unlikely to make any significant
> changes in the 8.4 planner code at this point...
> 
> 			regards, tom lane
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance