On Mon, May 01, 2006 at 07:35:02PM -0400, Tom Lane wrote: > Nolan Cafferky <Nolan.Cafferky@xxxxxxxxxxxxxxxxxx> writes: > > But, I'm guessing that random_page_cost = 1 is not a realistic value. > > Well, that depends. If all your data can be expected to fit in memory > then it is a realistic value. (If not, you should be real careful not > to make performance decisions on the basis of test cases that *do* fit > in RAM...) > > In any case, if I recall your numbers correctly you shouldn't need to > drop it nearly that far to get the thing to make the right choice. > A lot of people run with random_page_cost set to 2 or so. Also, the index scan cost estimator comments indicate that it does a linear interpolation between the entimated cost for a perfectly correlated table and a table with 0 correlation, but in fact the interpolation is exponential, or it's linear based on the *square* of the correlation, which just doesn't make much sense. I did some investigating on this some time ago, but never got very far with it. http://stats.distributed.net/~decibel/summary.txt has some info, and http://stats.distributed.net/~decibel/ has the raw data. Graphing that data, if you only include correlations between 0.36 and 0.5, it appears that there is a linear correlation between correlation and index scan time. Of course this is very coarse data and it'd be great if someone did more research in this area, preferably using pg_bench or other tools to generate the data so that others can test this stuff as well. But even with as rough as this data is, it seems to provide a decent indication that it would be better to actually interpolate linearly based on correlation, rather than correlation^2. This is a production machine so I'd rather not go mucking about with testing such a change here. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@xxxxxxxxxxxxx Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461