Re: serious under-estimation of n_distinct for clustered distributions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/29/2012 10:57 PM, Peter Geoghegan wrote:
On 29 December 2012 20:57, Stefan Andreatta <s.andreatta@xxxxxxxxxxx> wrote:
...

The general advice here is:

1) Increase default_statistics_target for the column.

I tried that, but to get good estimates under these circumstances, I need to set the statistics_target so high that the whole table gets analyzed. As this problem matters most for all of our large tables, I would have to set default_statistics_target to something like 100000 - that's a bit scary for production systems with tables of appr. 100GB, I find.


2) If that doesn't help, consider using the following DDL:

alter table foo alter column bar set ( n_distinct = 5.0);


Yes, that's probably best - even if it means quite some maintenance work. I do it like that:

ALTER TABLE test_1 ALTER COLUMN clustered_random_2000k SET (n_distinct = -0.05);

btw: Postgres will never set relative n_distinct values for anything larger than -0.1. If I determine (or know) it to be a constant but lower fraction, could it be a problem to explicitly set this value to between -0.1 and 0?


To activate that setting, however, an ANALYZE has to be run. That was not clear to me from the documentation:

 ANALYZE verbose test_1;


To check column options and statistics values:

 SELECT pg_class.relname AS table_name,
        pg_attribute.attname AS column_name, pg_attribute.attoptions
     FROM pg_attribute
         JOIN pg_class ON pg_attribute.attrelid = pg_class.oid
     WHERE pg_attribute.attnum > 0
         AND pg_class.relname = 'test_1'
         AND pg_attribute.attname = 'clustered_random_2000k';

 SELECT tablename AS table_name, attname AS column_name,
        null_frac, avg_width, n_distinct, correlation
     FROM pg_stats
     WHERE tablename = 'test_1' and attname = 'clustered_random_2000k';


And finally, we can undo the whole thing, if necessary:

 ALTER TABLE test_1 ALTER COLUMN clustered_random_2000k RESET (n_distinct);
 ANALYZE VERBOSE test_1;


Regards,
Stefan


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux