Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr

David G Johnston <david.g.johnston@xxxxxxxxx> · Tue, 5 Aug 2014 16:30:18 -0700 (PDT)

Shaun Thomas-2 wrote
>> Also, that doesn't make sense to me, since we don't have 2.5mil rows
>> that match this one SpawnID.  Could this suggest that my partial
>> hstore index is somehow misconstructed?  Or is that saying that
>> 2.5mil rows have a SpawnID, not all of which will be the one I'm
>> looking for?
> 
> I'm not sure. But according to that explain analyze, it matched 2.5 
> million... somethings. Perhaps it has to do with how hstore is actually 
> indexed. Maybe it starts at SpawnID, then narrows it down with 
> SpawnID-428870395.258592, but can't do so until it fetches the SpawnID 
> part. I'll stop commenting on hstore anything, because I never use it.

The presence of hstore in a red herring here.  The index itself is storing
the result of "hstore => 'key'" which has a data type of "text".  Note that
the use of a "btree" index reinforces this since (IIRC) a direct hstore
index would require gin/gist.

*Recommendation:* Create a multi-column index instead of (or in addition to)
the two single column ones.

Theorizing out loud here...take with a grain of salt - and experiment.

What is happening is that the entire (id) index is being bitmapped (hashed,
basically) so that it can be combined with the other (date) index (also
being bitmapped).  Since this process operates at the page level the bitmap
of an ID that is frequently used throughout time is going to have a very low
hit ratio (matched tuples / pages) and also result in a high recheck loss
rate.  The ID seems likely to have this problem - the Date index likely not
so much.

NOTE: I am confused by this line:
->  BitmapAnd  (cost=291564.31..291564.31 rows=28273 width=0) (actual
time=23843.870..23843.870 rows=0 loops=1)

How did actual match zero rows?  It should be something like 2.2M

Anyway, you should probably experiment with creating a multi-column index
instead of allowing PostgreSQL to BitmapAnd them together.  Likely the
timestamp will have higher cardinality and so should be listed first in the
index.  This is the only way it can combine two indexes and it thinks this
is better than using the "Date" index and then scanning every single page
for matching IDs.  

Experimentation (or more knowledgeable insight) into this particular
decision is worthwhile...

I would probably still increase statistic targets but I do not think that
will help as much as a compound index.

Note that using a CTE (which is kinda like an explicit BitmapAnd) is not
likely to be any better than an implicit BitmapAnd - but it doesn't hurt to
check.

I'm not sure how a partial index would help here...though maybe
partitioning...especially since a partial index on date doesn't typically
make sense since time is constantly changing.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/understanding-why-two-nearly-identical-queries-take-two-different-planner-routes-one-5s-and-one-2hr-tp5813819p5813840.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.