query against large table not using sensible index to find very small amount of data

"Andrew W. Gibbs" <awgibbs@xxxxxxxxxxx> · Tue, 8 Apr 2014 08:48:35 -0400

I have a fairly large table (~100M rows), let's call it "events", and
among other things it has a couple of columns on it, columns that
we'll call entity_type_id (an integer) and and published_at (a
timestamp).  It has, among others, indices on (published_at) and
(entity_type_id, published_at).

A very common query against this table is of the form...

SELECT * FROM events WHERE entity_type_id = XXX ORDER BY published_at DESC LIMIT 25;

... to get the most recent 25 events from the table for a given type
of entity, and generally the query planner does the expected thing of
using the two-part index on (entity_type_id, published_at).  Every now
and again, though, I have found the query planner deciding that it
ought use the single column (published_at) index.  This can,
unsurprisingly, result in horrendous performance if events for a given
entity type are rare, as we end up with a very long walk of an index.

I had this happen again yesterday and I noticed something of
particular interest pertaining to the event.  Specifically, the query
was for an entity type that the system had only seen for the first
time one day prior, and furthermore the events table had not been
analyzed by the statistics collector for a couple of weeks.

My intuition is that the query planner, when working with an enormous
table, and furthermore encountering an entity type that the statistics
collector had never previously seen, would assume that the number of
rows in the events table of that entity type would be very small, and
therefore the two-part index on (entity_type_id, published_at) would
be the right choice.  Nonetheless, an EXPLAIN was showing usage of the
(published_at) index, and since there were only ~20 rows in the entire
events table for that entity type the queries were getting the worst
possible execution imaginable, i.e. reading in the whole table to find
the rows that hit, but doing it with the random I/O of an index walk.

As an experiment, I ran a VACUUM ANALYZE on the events table, and then
re-ran the EXPLAIN of the query, and...  Same query plan again...
Maybe for whatever issue I am having the random sampling nature of the
statistics collector made it unhelpful, i.e. in its sampling of the
~100M rows it never hit a single row that had the new entity type
specified?

Other possibly relevant pieces of information...  The entity type
column has a cardinality in the neighborhood of a couple dozen.
Meanwhile, for some of the entity types there is a large and ongoing
number of events, and for other entity types there is a smaller and
more sporadic number of events.  Every now and again a new entity type
shows up.

I can't understand why the query planner would make this choice.
Maybe it has gotten ideas into its head about the distribution of
data?  Or maybe there is a subtle bug that my data set is triggering?
Or maybe I need to turn some knobs on statistics collection?  Or maybe
it's all of these things together?  I worry that even if there is a
knob turning exercise that helps that we're still going to get burned
whenever a new entity type shows up until we re-run ANALYZE, assuming
that I can find a fix that involves tweaking statistics collection.  I
just can't fathom how it would ever be the case that Postgres's choice
of index usage in this case would make sense.  It doesn't even slot
cleanly into the problem space of "why did Postgres do a sequential
scan instead of an index scan?".  If you're doing a query of the
described form and the entity type is specified, wouldn't the two-part
index theoretically _always_ yield better performance than the
one-part index?  Maybe I have a flawed understanding of the cost of
using various indexes?  Maybe there is something analogous between
sequential-versus-index-scan and one-part-versus-two-part-index scan
choices?

FWIW, we're running on 8.4.X and using the out-of-the-box
default_statistics_target setting and haven't dabbled with setting
table level statistics configurations.

Thoughts?  Recommended reading?

  -- AWG

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance