Re: When to bump up statistics?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dawid,

> I wonder what are the implications of using this statement,
> I know by using, say n=100, ANALYZE will take more time,
> pg_statistics will be bigger, planner will take longer time,
> on the other hand it will make better decisions... Etc, etc.

Yep.   And pg_statistics will need to be vacuumed more often.

> Is bumping up statistics is only useful for indexed columns?

No.   It's potentially useful for any queried column.

> 1) huge table with huge number of distinct values (_almost_
>     unique ;))

Yes.

> 2) huge table with relatively equally distributed values
>     (like each value is in between, say, 30-50 rows).

Not usually.

> 3) huge table with unequally distributed values (some
>     values are in 1-5 rows, some are in 1000-5000 rows).

Yes.

> 4) huge table with small number values (around ~100
>     distinct values, equally or uneqally distributed).

Not usually, especially if they are equally distributed.

> 5) boolean column.

Almost never, just as it is seldom useful to index a boolean column.

> I think SET STATISTICS 100 is very useful for case with
> unequally distributed values, but I wonder what about
> the other cases.  And as a side note -- what are the
> reasonable bounds for statistics (between 10 and 100?)

Oh, no, I've used values up to 500 in production, and we've tested up to the 
max on DBT-3.    In my experience, if the default (10) isn't sufficient, you 
often have to go up to > 250 to get a different plan.

> What are the runtime implications of setting statistics
> too large -- how much can it affect queries?

It won't affect select queries.   It will affect ANALYZE time (substantially 
in the aggregate) and maintenance on the pg_statistics table.

> And finally -- how other RDBMS and RDBM-likes deal
> with this issue? :)

Most don't allow such fine-tuned adjustment.   MSSQL, for example, allows only 
setting it per-table or maybe even database-wide, and on that platform it 
doesn't seem to have much effect on query plans.    Oracle prefers to use 
HINTS, which are a brute-force method to manage query plans.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux