Re: n_distinct off by a factor of 1000

Pavel Luzanov <p.luzanov@xxxxxxxxxxxxxx> · Thu, 25 Jun 2020 16:27:08 +0300



    Hello,

      
      I got my first hint of why this problem occurs when I looked
        at the statistics.  For the column in question, "instrument_ref"
        the statistics claimed it to be:

      
      The default_statistics_target=500, and analyze has been run.

      
      select * from pg_stats where attname like 'instr%_ref'; --
        Result: 40.000

      
      select count(distinct instrumentid_ref) from bigtable --
        Result: 33 385 922 (!!)

      
      That is an astonishing difference of almost a 1000X.  

      
      I have tried to increase the statistics target to 5000, and
        it helps, but it reduces the error to 100X.  Still crazy high.

      
    As far as I know, increasing default_statistics_target will not help. [1]

      
      I have considered these fixes:

      
      - hardcode the statistics to a particular ratio of the total
        number of rows

      
    You can hardcode the percentage of distinct values:

    ALTER TABLE bigtable ALTER COLUMN instrument_ref SET (
    n_distinct=-0.06 ); /* -1 * (33385922 / 500000000) */

    
    [1]
https://www.postgresql.org/message-id/4136ffa0812111823u645b6ec9wdca60b3da4b00499%40mail.gmail.com

    -----
Pavel Luzanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company