Re: Worse perfomance on 8.2.0 than on 7.4.14

Rolf Østvik <rolfostvik@xxxxxxxx> · Fri, 5 Jan 2007 19:28:33 +0100 (CET)

--- Tom Lane <tgl@xxxxxxxxxxxxx> skrev:

> The number-of-matching-rows estimate has gone up by a factor of 10,
> which undoubtedly has a lot to do with the much higher cost estimate.
> Do you have any idea why that is ... is the table really the same size
> in both servers?  If so, could we see the pg_stats row for
> step_result_subset.uut_result on both servers?

Table step_result_subset and uut_result_subset in both databases is created from same schema
definition file and filled with data from the same data source file.

==== Server 7.4.14: ====

logistics_74# select count(*) from step_result_subset;
  count   
----------
 17179506
(1 row)

logistics_74# select count(distinct uut_result) from step_result_subset;
 count  
--------
 176450
(1 row)

logistics_74# analyse verbose step_result_subset;
INFO:  analyzing "public.step_result_subset"
INFO:  "step_result_subset": 92863 pages, 3000 rows sampled, 17179655 estimated total rows
ANALYZE

logistics_74# select * from pg_stats where tablename = step_result_subset and
attname='uut_result';
 schemaname |     tablename      |  attname   | null_frac | avg_width | n_distinct |              
         most_common_vals                        |                                              
most_common_freqs                                               |                           
histogram_bounds                             | correlation 
------------+--------------------+------------+-----------+-----------+------------+----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+-------------
 public     | step_result_subset | uut_result |         0 |         4 |      57503 |
{70335,145211,17229,20091,21827,33338,34370,42426,47274,54146} |
{0.001,0.001,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667}
| {213,30974,51300,68529,85053,100838,114971,128126,144230,161657,176691} |    0.951364
(1 row)

==== Server 8.2.0: ====

logistics_82# select count(*) from step_result_subset;
  count   
----------
 17179506
(1 row)

logistics_82# select count(distinct uut_result) from step_result_subset;
 count  
--------
 176450
(1 row)

logistics_82# analyse verbose step_result_subset;
INFO:  analyzing "public.step_result_subset"
INFO:  "step_result_subset": scanned 3000 of 92863 pages, containing 555000 live rows and 0 dead
rows; 3000 rows in sample, 17179655 estimated total rows
ANALYZE

logistics_# select * from pg_stats where tablename = step_result_subset and attname='uut_result';
 schemaname |     tablename      |  attname   | null_frac | avg_width | n_distinct |              
         most_common_vals                         |                                               
most_common_freqs                                                |                           
histogram_bounds                             | correlation 
------------+--------------------+------------+-----------+-----------+------------+-----------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+-------------
 public     | step_result_subset | uut_result |         0 |         4 |       6516 |
{35010,111592,35790,41162,56844,57444,60709,73017,76295,106470} |
{0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
| {147,31791,54286,70928,85996,102668,117885,130947,144766,162098,176685} |    0.954647
(1 row)

Then on server 8.2.0 i need to set statistics to ~120 on step_result_subset.uut_result to get
n_distinct to be in same range as n_distinct on 7.4.14.

Even with a statistics value of 1000, the n_distinct value does only reach ~138 000. Is it correct
that _ideally_ the n_distinct value should be the same as "select count(distinct uut_result) from
step_result_subset"? 

====
Even with better statistics on step_result_subset.uut_result neither of 7.4.14 or 8.2.0 manages to
pick the best plan when i want to select bigger datasets (in my examples that would be to set an
earlier date in the where clause for "ur.start_date_time > '2006-12-11'"). I will continue to
adjust other parameters and see what i can manage myself.

Best regards 
Rolf Østvik

__________________________________________________
Bruker du Yahoo!?
Lei av spam?  Yahoo! Mail har den beste spambeskyttelsen 
http://no.mail.yahoo.com