Re: Deceiding which index to use

Richard Huxton <dev@xxxxxxxxxxxx> · Fri, 09 Mar 2007 15:21:20 +0000

Mezei Zoltán wrote:
Richard Huxton wrote:

And does the planner know that?
SELECT * FROM pg_stats WHERE tablename='subscriber' AND attname='anumber';
It's the n_distinct you're interested in, and perhaps most_common_freqs.

n_distinct is -0.359322 and most_common_vals contains about 10 different 
anumbers (which are corretct), most_common_freqs are between 0.01 and 0.001. 
What does n_distinct exactly mean? Why is it negative?

It's saying that it's a ratio, so if you doubled the number of 
subscribers it would expect that the number of unique anumber's would 
double too. So you've got about 36% of the rows with unique values - 
pretty much what you said earlier. That's not bad, since the planner 
only uses an estimate.

OK - so the next place to look is the distribution of values for 
subscriber_id on the output_message_log. Does that have some subscribers 
with many rows and lots with hardly any? If so, you might need to 
increase the stats on that column:

ALTER TABLE output_message_log ALTER COLUMN subscriber_id SET STATISTICS 
<num>;
ANALYSE output_message_log (subscriber_id);

The <num> defaults to 10, but can be set as high as 1000. You want to 
try and capture the "big" subscribers.

--
  Richard Huxton
  Archonet Ltd