Csaba Nagy <nagy@xxxxxxxxxxxxxx> writes: > On Wed, 2007-09-26 at 11:22 -0400, Tom Lane wrote: >> ... how >> many values of "a" are there really, and what's the true distribution of >> counts? > table_a has 23366 distinct values. Some statistics (using R): >> summary(table_a_histogram) > a count > Min. : 70000857 Min. : 1 > 1st Qu.:700003628 1st Qu.: 9 > Median :700011044 Median : 22 > Mean :622429573 Mean : 17640 > 3rd Qu.:700018020 3rd Qu.: 391 > Max. :800003349 Max. :3347707 Ugh, classic long-tail distribution. This is a really hard problem to solve by sampling --- the sample will naturally be biased towards the more common values, and so ANALYZE tends to conclude there are fewer distinct values than there really are. That means n_distinct in the stats is too small, and that feeds directly into the misestimation of the number of matching rows. And yet there's another trap here: if the parameter you passed in chanced to be one of the very common values, a plan that was optimized for a small number of matches would perform terribly. We've speculated about trying to deal with these types of situations by switching plans on-the-fly at runtime, but that's just blue-sky dreaming at the moment. In the short run, if boosting the stats target doesn't result in acceptable plans, there may be no real solution other than to avoid parameterized queries on this column. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match