Re: Bogus startup cost for WindowAgg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ants Aasma wrote:
I hit an issue with window aggregate costing while experimenting with
providing a count of the full match along side a limited result set.
Seems that the window aggregate node doesn't take into account that it
has to consume the whole input before outputting the first row. When
this is combined with a limit, the resulting cost estimate is wildly
underestimated, leading to suboptimal plans.

Is this a known issue? I couldn't find anything referring to this on
the mailing list or todo.

What is your histogram size? That's defined by the default_statistics_target in your postgresql.conf.
Check the column histograms like this:

   news=> select attname,array_length(most_common_vals,1)
   from pg_stats
   where tablename='moreover_documents_y2010m09';
          attname        | array_length
   ----------------------+--------------
document_id | dre_reference | headline | 1024
    author               |          212
url | rank | 59
    content              |         1024
stories_like_this | internet_web_site_id | 1024
    harvest_time         |         1024
    valid_time           |         1024
    keyword              |           95
article_id | media_type | 5
    source_type          |            1
    created_at           |         1024
    autonomy_fed_at      |         1024
    language             |           37
   (18 rows)

   news=> show default_statistics_target;
    default_statistics_target
   ---------------------------
    1024
   (1 row)

You will see that for most of the columns, the length of the histogram array corresponds to the value of the default_statistics_target parameter. For those that are smaller, the size is the total number of values in the column in the sample taken by the "analyze" command. The longer histogram, the better plan. In this case, the size does matter. Note that there are no histograms for the document_id and dre_reference columns. Those are the primary and unique keys, the optimizer can easily guess the distribution of values.


--

Mladen Gogala Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions




--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux