Re: Cluster vs. non-cluster query planning

"Jim C. Nasby" <jnasby@xxxxxxxxxxxxx> · Tue, 2 May 2006 16:29:42 -0500

On Mon, May 01, 2006 at 07:35:02PM -0400, Tom Lane wrote:
> Nolan Cafferky <Nolan.Cafferky@xxxxxxxxxxxxxxxxxx> writes:
> > But, I'm guessing that random_page_cost = 1 is not a realistic value.
> 
> Well, that depends.  If all your data can be expected to fit in memory
> then it is a realistic value.  (If not, you should be real careful not
> to make performance decisions on the basis of test cases that *do* fit
> in RAM...)
> 
> In any case, if I recall your numbers correctly you shouldn't need to
> drop it nearly that far to get the thing to make the right choice.
> A lot of people run with random_page_cost set to 2 or so.

Also, the index scan cost estimator comments indicate that it does a
linear interpolation between the entimated cost for a perfectly
correlated table and a table with 0 correlation, but in fact the
interpolation is exponential, or it's linear based on the *square* of
the correlation, which just doesn't make much sense.

I did some investigating on this some time ago, but never got very  far
with it. http://stats.distributed.net/~decibel/summary.txt has some
info, and http://stats.distributed.net/~decibel/ has the raw data.
Graphing that data, if you only include correlations between 0.36 and
0.5, it appears that there is a linear correlation between correlation
and index scan time.

Of course this is very coarse data and it'd be great if someone did more
research in this area, preferably using pg_bench or other tools to
generate the data so that others can test this stuff as well. But even
with as rough as this data is, it seems to provide a decent indication
that it would be better to actually interpolate linearly based on
correlation, rather than correlation^2. This is a production machine so
I'd rather not go mucking about with testing such a change here.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@xxxxxxxxxxxxx
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461