Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr

Shaun Thomas <sthomas@xxxxxxxxxxxxxxxx> · Tue, 5 Aug 2014 17:23:57 -0500

On 08/05/2014 04:08 PM, john gale wrote:

Most of the planner options haven't diverged from default, so
default_statistics_target is still set to 100.  I'm vaguely
understanding the docs on that variable, but if we have the space it
sounds like we should bump this up significantly to accommodate more
statistics?

Yeah. You'll want to bump that up. However, I'm not sure how much of a 
difference that'll make to the hstore stuff due to the blobby nature of 
that type of data.

Also, that doesn't make sense to me, since we don't have 2.5mil rows
that match this one SpawnID.  Could this suggest that my partial
hstore index is somehow misconstructed?  Or is that saying that
2.5mil rows have a SpawnID, not all of which will be the one I'm
looking for?

I'm not sure. But according to that explain analyze, it matched 2.5 
million... somethings. Perhaps it has to do with how hstore is actually 
indexed. Maybe it starts at SpawnID, then narrows it down with 
SpawnID-428870395.258592, but can't do so until it fetches the SpawnID 
part. I'll stop commenting on hstore anything, because I never use it.

Not necessarily 300k records, but yes we essentially have one 100G+
table that we pull results to analyze from, whether it's 30 records
or 3k.  300k seems like an abnormality but that's probably why I
jumped on this one particular query in the first place.

Yeah. Like I said, it's all of that preliminary work that's inflating 
your execution time. Something tells me that better stats might help you 
out. Increase default_statistics_target to 300 or 400, analyze the 
table, and try again. See what happens.

Interestingly, make the select clause *less* specific by removing the
started_at part has sped this up quite a bit (if I'm reading the
explain correctly, which I don't really have a history of doing):

You might think that, but not really. :) Consider this:

->  Index Scan using index_testruns_on_custom_spawnid on testruns
(cost=0.57..306430.20 rows=319935 width=1399) (actual
time=0.030..177067.785 rows=348672 loops=1)

That's the only index scan it applied. By not using the much larger 
index with far more hits (5-million for the previous "more specific" 
query) it did less work. Something tells me that the 
index_testruns_on_started_at_2 index is confusing the planner into doing 
way more than it should.

You might be better off by using this query in a CTE (WITH syntax) and 
then using a post-processing WHERE clause to filter to the proper time 
range. Tricking the planner that way is kinda gross, but it works. This 
is assuming your stat adjustment doesn't work.

I've also been thinking about creating partial date indexes since
that's one of the few easy buckets to shove things into, which seems
like it would cut down on the searchable index size.

Properly applying partial indexes without overdoing it is a delicate 
dance, but it can pay off in spades. Try it, it might work out. ;)

--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@xxxxxxxxxxxxxxxx

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email