Re: LIMIT confuses the planner

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tom Lane wrote:
Kouber Saparev <kouber@xxxxxxxxxxx> writes:
Now the planner believes there're 910 rows, which is a bit closer to the real data:

swing=# select avg(length) from (select username, count(*) as length from login_attempt group by username) as freq;
          avg
----------------------
  491.6087310427555479
(1 row)

Hmph, that's still not real good.  Ideally it should be estimating
*less* than the average frequency, because the estimate is made after
excluding all the most-common-values, which evidently 'kouber' is not
one of.  I suppose there's quite a large number of infrequently-seen
usernames and the ndistinct estimate is much less than reality?  (Look
at the pg_stats row for this column.)  It might be worth going all the
way to stats target 1000 for this column.


I altered the statistics for that column to 1000, so now the planner assumes exactly 492 rows for the fore-mentioned query, which is indeed the average. It never went *less* than that value, it was always higher, i.e. for a statistics value of 600, it was 588, for 800, it became 540.

The current value of n_distinct (given statistics=1000) is:

db=# SELECT n_distinct FROM pg_stats WHERE tablename='login_attempt' AND attname='username';
 n_distinct
------------
       5605
(1 row)

db=# SELECT COUNT(DISTINCT username) FROM login_attempt;
 count
-------
 23391
(1 row)


In fact, what is n_distinct standing for, apart the famous formula:
n*d / (n - f1 + f1*n/N)

;-)

Regards,
--
Kouber Saparev
http://kouber.saparev.com

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux