Re: n_distinct off by a factor of 1000

Fabio Pardi <f.pardi@xxxxxxxxxxxx> · Tue, 23 Jun 2020 16:05:49 +0200



    On 23/06/2020 14:42, Klaudie Willis
      wrote:

    
      I got my first hint of why this problem occurs when I looked
        at the statistics.  For the column in question, "instrument_ref"
        the statistics claimed it to be:

      
      The default_statistics_target=500, and analyze has been run.

      
      select * from pg_stats where attname like 'instr%_ref'; --
        Result: 40.000

      
      select count(distinct instrumentid_ref) from bigtable --
        Result: 33 385 922 (!!)

      
      That is an astonishing difference of almost a 1000X.  

      
    I think you are counting 2 different things here.

    
    The first query returns all the columns "like 'instr%_ref'" present
    in the statistics (so in the whole cluster), while the second is
    counting the actual number of different rows in bigtable.

    
    regards,

    
    fabio pardi