Search Postgresql Archives

Query runtime dependent on ANALYZE run

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

(I've sent a copy of this message to pgus-general in error. Sorry about
that.)

I've noticed that the selection of the executed query plan (and
therefore query runtime) is dependent on the statistics generated by
an ANALYZE run. As an demonstration, I chose the best runtime of 5
consecutive runs of the query linked below, regenerated the statistics
for the column node_annotation.value and re-ran the query. This
experiment was repeated a hundred times each for the statistics
targets 10, 100 (default), 1000, 10000. I've used PostgreSQL 9.1.3.

Query: http://www.informatik.hu-berlin.de/~rosenfel/analyze/query.sql
Schema: http://www.informatik.hu-berlin.de/~rosenfel/analyze/schema.pdf

The next table shows for each statistics target the number of distinct
plans generated in the experiment, how often the most common plan was
generated, the runtime (ms) of the best and worst non-unique plan, and
the number of timeouts where the query did not finish within 60 seconds.

| Statistics | # plans | most common plan | Best | Worst | Timeouts |
|------------+---------+------------------+------+-------+----------|
|         10 |      37 |               60 | 1876 |  2180 |        0 |
|        100 |      90 |                4 | 2225 |  7927 |       14 |
|       1000 |      75 |                6 | 2214 |  6329 |       22 |
|      10000 |       6 |               85 | 2195 |  2900 |        3 |

The distribution for each statistics target is linked below:

Distribution: http://www.informatik.hu-berlin.de/~rosenfel/analyze/histogram.pdf

As one can see, using the default value of 100 (and also 1000) there is
a considerable spread in the runtime of the query. The best plan is
chosen most often, but only about a quarter of the time and there are
also many timeouts. The best results can be achieved with a statistics
target of 10: The most stable query plan is the second best and there
are no timeouts. Using a statistics target of 10000 generates the most
stable plan selection, i.e. the same plan is chosen most often, but it
is almost a second slower than the best plan.

I would like to know how I can mitigate against these random results.

Cheers,
Viktor

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux