On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote: > On 4/1/06, Brendan Duddridge <brendan@xxxxxxxxxxxxxx> wrote: > > Hi Jim, > > > > I'm not quite sure what you mean by the correlation of category_id? > > It means how many distinct values does it have (at least that's my > understanding of it ;) ). Your understanding is wrong. :) What you're discussing is n_distinct. http://www.postgresql.org/docs/8.1/interactive/view-pg-stats.html correlation: "Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk. (This column is NULL if the column data type does not have a < operator.)" In other words, the following will have a correlation of 1: 1 2 3 ... 998 999 1000 And this is -1... 1000 999 ... 2 1 While this would have a very low correlation: 1 1000 2 999 ... The lower the correlation, the more expensive an index scan is, because it's more random. As I mentioned, I believe that the current index scan cost estimator is flawed though, because it will bias heavily against correlations that aren't close to 1 or -1. So, what does SELECT * FROM pg_stats WHERE tablename='table' AND attname='category_id'; show? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@xxxxxxxxxxxxx Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461