Re: Duplicate Extended Statistics

Asad Ali <asadalinagri@xxxxxxxxx> · Wed, 4 Sep 2024 13:28:45 +0500

Hi Ilia,
In PostgreSQL, it is possible to create duplicate extended statistics because the system does not enforce uniqueness on statistics definitions. However, this is generally not recommended, as it leads to longer ANALYZE times, increased storage usage, potential planner performance impact, and unnecessary complexity. In practice, duplicates are rare because users and tools usually avoid redundancy, as there is no added benefit to having multiple identical sets of statistics on the same columns.

Regards,
Asad Ali

On Tue, Sep 3, 2024 at 6:10 PM Ilia Evdokimov <ilya.evdokimov@xxxxxxxxxxxxxx> wrote:

    Hello everyone,
    I have a question regarding extended statistics in PostgreSQL.
      Why is it possible to create duplicate extended statistics? To
      make it clearer, here’s an example:
    CREATE TABLE t(a int,
      b int);

      INSERT INTO t(a, b) VALUES (...);

      CREATE STATISTICS ON a, b FROM
      t;

      ANALYZE t;

      ....

      CREATE STATISTICS ON a, b FROM
      t;

      ANALYZE t;
    After executing these queries, the following issues might arise:

      ANALYZE will take longer to run because, for example, MCV
        extended statistics would need to be gathered twice.
      Duplicate information will be stored.
      The planner might take longer to find the relevant statistics
        since it has to search through them in a loop.

    Or do duplicate extended statistics practically never occur in
      practice?
    Thanks in advance for your response.

    -- 
Regards,
Ilia Evdokimov,
Tantor Labs LCC.