Re: Performance Anomaly with "col in (A,B)" vs. "col = A OR col = B" ver. 9.0.3

Tom Lane <tgl@xxxxxxxxxxxxx> · Mon, 26 Sep 2011 20:39:23 -0400

Craig Ringer <ringerc@xxxxxxxxxxxxx> writes:
> This has me wondering about putting together a maintenance/analysis tool 
> that generates and captures stats from several ANALYZE runs and compares 
> them to see if they're reasonably consistent. It then re-runs with 
> higher targets as a one-off, again to see if the stats agree, before 
> restoring the targets to defaults. The tool could crunch comparisons of 
> the resulting stats and warn about tables or columns where the default 
> stats targets aren't sufficient.

It would certainly be useful to have such a tool, but I suspect it's
easier said than done.  The problem is to know whether the queries on
that table are particularly sensitive to having better stats.  I think
we've largely solved issues having to do with the quality of the
histogram (eg, what fraction of the table has values falling into some
range), and the remaining trouble spots have to do with predicting the
frequency of specific values that are too infrequent to have made it
into the most-common-values list.  Enlarging the MCV list helps not so
much by getting these long-tail values into the MCV list --- they
probably still aren't there --- as by allowing us to tighten the upper
bound on what the frequency of an unrepresented value must be.  So what
you have to ask is how many queries care about that.  Craig James'
example query in this thread is sort of a worst case, because the values
it's searching for are in fact not in the table at all.

			regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance