Re: [EXT] Re: Improve "select count(*)" query - takes more than 30 mins for some large tables

Mladen Gogala <gogala.mladen@xxxxxxxxx> · Tue, 12 Jul 2022 22:54:50 -0400



    On 7/12/22 14:25, MichaelDBA Vitale
      wrote:

    
       That is not true: doing the select on the primary key will
        still result in a table scan, not an index scan.  The heap
        always gets accessed for select counts. 
       

       Regards, 
       Michael Vitale 
    
    Well, not always:
    mgogala=# explain select
        count(*) from emp;

                                QUERY PLAN                        

        ----------------------------------------------------------

         Aggregate  (cost=1.18..1.19 rows=1 width=8)

           ->  Seq Scan on emp  (cost=0.00..1.14 rows=14 width=0)

        (2 rows)

    
    That is the classic "select count(*)".  Now, let's see PK:
    mgogala=# \d emp

                                   Table "mgogala.emp"

          Column  |            Type             | Collation | Nullable |
        Default 

----------+-----------------------------+-----------+----------+---------

         empno    | smallint                    |           | not null |
        

         ename    | character varying(10)       |           |          |
        

         job      | character varying(9)        |           |          |
        

         mgr      | smallint                    |           |          |
        

         hiredate | timestamp without time zone |           |          |
        

         sal      | double precision            |           |          |
        

         comm     | double precision            |           |          |
        

         deptno   | smallint                    |           |          |
        

        Indexes:

            "emp_pkey" PRIMARY KEY, btree (empno)

        Foreign-key constraints:

            "fk_deptno" FOREIGN KEY (deptno) REFERENCES dept(deptno)

      
      Now, let's do scan on "count(empno)":
    mgogala=# explain select
        count(empno) from emp;

                                QUERY PLAN                        

        ----------------------------------------------------------

         Aggregate  (cost=1.18..1.19 rows=1 width=8)

           ->  Seq Scan on emp  (cost=0.00..1.14 rows=14 width=2)

        (2 rows)

    
    So far, so good. Sequential scan, despite the existence of the
      NOT NULL primary key. Now comes the twist:

    
    mgogala=# set
        enable_seqscan=off;

        SET

        mgogala=# explain select count(empno) from emp;

                                           QUERY
        PLAN                                   

         
--------------------------------------------------------------------------------

        -

         Aggregate  (cost=12.38..12.39 rows=1 width=8)

           ->  Index Only Scan using emp_pkey on emp 
        (cost=0.14..12.35 rows=14 width=2)

        (2 rows)

    
    Voila, we've got index only "count" scan. So, in some cases
      Postgres does do index only scan. What is surprising is that
      pg_hint_plan doesn't produce the desired results:
    mgogala=# explain /*+
        IndexScan(emp emp_pkey) */ select count(empno) from emp;

                                          QUERY
        PLAN                                   

-------------------------------------------------------------------------------

         Aggregate  (cost=10000000001.17..10000000001.18 rows=1 width=8)

           ->  Seq Scan on emp  (cost=10000000000.00..10000000001.14
        rows=14 width=2)

         JIT:

           Functions: 3

           Options: Inlining true, Optimization true, Expressions true,
        Deforming true

        (5 rows)

    
    This is strange in more ways than one:
    
      The hint didn't work, Postgres still performed sequential scan
      JIT got involved for the reasons that are completely unclear.
      The cost is enormous, despite the fact the the "emp" table has
        only 14 rows and is fully contained in a single block.

      
    However, if we do GUC hint, the result behaves as expected:
    mgogala=# explain /*+
        set(enable_seqscan off) */ select count(empno) from emp e;

                                            QUERY
        PLAN                                  

           
--------------------------------------------------------------------------------

        ---

         Aggregate  (cost=12.38..12.39 rows=1 width=8)

           ->  Index Only Scan using emp_pkey on emp e 
        (cost=0.14..12.35 rows=14 width=2)

        (2 rows)

    
    So, it is possible to force index only scan. However, from purely
      theoretic point of view, the speed difference is questionable.
      Indexes are stored differently than tables, they have branch
      blocks and data bocks with an ample free space in the data blocks
      to prevent block splits and index level increase, with mandatory
      re-balancing. Indexes are actually quite large structures,
      sometimes even larger than the underlying tables. In general,
      going after the index-only scan doesn't look like a good strategy.
    After all this work, I also have a question: does PostgreSQL read
      index blocks in batches or it does single block scans? Some other
      databases have 2 events: "db file scattered read" for sequential
      scan (called "full table scan" in that specific vernacular) and
      "db file sequential read" for index scans. Those database only do
      single block reads from indexes while the tables can be read using
      batches of blocks. Does PostgreSQL do the same thing?

    
    -- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com