Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters

Cosmin Prund <cprund@xxxxxxxxx> · Thu, 16 Jan 2020 19:18:24 +0200

Hi Tom, and thanks.
Running ANALYZE doesn't change a thing. REINDEXING doesn't change a thing. I know it's an odd choice of plan - that's why I'm here!

I thought I'd just post what felt relevant, hoping it's not something out of the ordinary and I'm just missing something obvious.
Here's lots of data:

===========================================================================

SELECT version()
PostgreSQL 10.10 (Ubuntu 10.10-0ubuntu0.18.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit

===========================================================================

 \d "LucrareBugetDate"
                                                       Table "public.LucrareBugetDate"
           Column           |         Type          | Collation | Nullable |                             Default
----------------------------+-----------------------+-----------+----------+------------------------------------------------------------------
 OrdonatorPrincipalId       | uuid                  |           |          |
 UnitateSubordonataId       | uuid                  |           |          |
 CentralizatorSelectiv      | text                  |           |          |
 IdRand                     | character varying(32) |           |          |
 IdColoana                  | character varying(32) |           |          |
 ClasEc                     | character varying(32) |           |          |
 CodSector                  | character varying(4)  |           |          |
 CodSursa                   | character varying(4)  |           |          |
 Paragraf                   | character varying(16) |           |          |
 Venit                      | character varying(16) |           |          |
 FelValoare                 | integer               |           | not null |
 Valoare                    | numeric               |           | not null |
 RangOperator               | integer               |           | not null |
 OrdineCalcul               | integer               |           | not null |
 ConflictFormuleAlternative | boolean               |           | not null | false
 Sectiune                   | integer               |           |          |
 RefColoana                 | text                  |           |          |
 RefDocument                | text                  |           |          |
 RefLinie                   | text                  |           |          |
 SeqModificare              | integer               |           | not null | 0
 LucrareBugetDateId         | integer               |           | not null | nextval('"LucrareBugetDate_LucrareBugetDateIdV2_seq"'::regclass)
 LucrareBugetVersiuneId     | smallint              |           | not null |
 CentralizatorSelectivId    | uuid                  |           |          |
 Stil                       | text                  |           |          |
 ValoareArhivata            | boolean               |           |          |
Indexes:
    "PK_LucrareBugetDate" PRIMARY KEY, btree ("LucrareBugetVersiuneId", "LucrareBugetDateId")
    "IX_LucrareBugetDate_LucrareBugetVersiuneId_LucrareBugetDateId" btree ("LucrareBugetVersiuneId", "LucrareBugetDateId")
Foreign-key constraints:
    "FK_LucrareBugetDate_LucrareBugetVersiune_LucrareBugetVersiuneId" FOREIGN KEY ("LucrareBugetVersiuneId") REFERENCES "LucrareBugetVersiune"("LucrareBugetVersiuneId") ON DELETE CASCADE

===========================================================================

SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='LucrareBugetDate';
     relname      | relpages |  reltuples  | relallvisible | relkind | relnatts | relhassubclass |   reloptions    | pg_table_size
------------------+----------+-------------+---------------+---------+----------+----------------+-----------------+---------------
 LucrareBugetDate |  2659660 | 4.17124e+07 |        671510 | r       |       25 | f              | {fillfactor=50} |   21793775616
(1 row)

===========================================================================

Does the table have anything unusual about it?
contains large objects: NO
has a large proportion of NULLs in several columns: NO
receives a large number of UPDATEs or DELETEs regularly: YES - Lots of UPDATES but no UPDATES to indexed columns. No DELETE's.
is growing rapidly: I'm inserting millions of records at once but not very often. Have manually done ANALYZE and REINDEX
has many indexes on it: NO
uses triggers that may be executing database functions, or is calling functions directly: NO

  ===========================================================================  
EXPLAIN (ANALYZE, BUFFERS) select "IdRand", "IdColoana", "Valoare" from "LucrareBugetDate" where ("LucrareBugetVersiuneId" = 92) and ("LucrareBugetDateId" in (10,11));
                                                                                      QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using "IX_LucrareBugetDate_LucrareBugetVersiuneId_LucrareBugetDateId" on "LucrareBugetDate"  (cost=0.56..2.37 rows=1 width=13) (actual time=0.096..978.398 rows=2 loops=1)
   Index Cond: ("LucrareBugetVersiuneId" = 92)
   Filter: ("LucrareBugetDateId" = ANY ('{10,11}'::integer[]))
   Rows Removed by Filter: 1869178
   Buffers: shared hit=161178
 Planning time: 0.699 ms
 Execution time: 978.433 ms

===========================================================================
Was this query always slow, or has it gotten slower over time? If the plan/execution of the query used to be different, do you have copies of those query plans? Has anything changed in your database other than the accumulation of data? 
The query is usually instantaneous. 
Here's the same query ran o a different server running the same database with comparable data (COLD server, frist run! The second run has execution time = 0.040ms):

  EXPLAIN (ANALYZE, BUFFERS) select "IdRand", "IdColoana", "Valoare" from "LucrareBugetDate" where ("LucrareBugetVersiuneId" = 92) and ("LucrareBugetDateId" in (10,11));  

Index Scan using "PK_LucrareBugetDate" on "LucrareBugetDate"  (cost=0.56..4.85 rows=2 width=13) (actual time=22.922..23.123 rows=2 loops=1)
  Index Cond: (("LucrareBugetVersiuneId" = 92) AND ("LucrareBugetDateId" = ANY ('{10,11}'::integer[])))
  Buffers: shared hit=12 read=4
Planning time: 66.743 ms
Execution time: 23.190 ms

===========================================================================

Hardware: 

2 x  Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz, 128 GBhz, Local ZFS-based storage built from 4 x NVME SSD drives.
I doubt it's hardware related.

===========================================================================

SELECT * FROM pg_stat_user_tables WHERE relname='table_name'; 
  relid | schemaname |     relname      | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze |          last_vacuum          | last_autovacuum |         last_analyze          |       last_autoanalyze        | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
-------+------------+------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------------------------+-----------------+-------------------------------+-------------------------------+--------------+------------------+---------------+-------------------
 20655 | public     | LucrareBugetDate |      306 |   7765749768 |  8398680 |  983464378904 |  58388025 |   2944618 |  16675590 |       2887093 |   41712435 |      61524 |             2588381 | 2019-11-03 19:15:58.765546+00 |                 | 2020-01-15 16:11:26.301756+00 | 2019-12-20 10:12:53.737619+00 |            1 |                0 |            40 |                12
  ===========================================================================  

SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation FROM pg_stats WHERE attname='LucrareBugetDateId' AND tablename='LucrareBugetDate' ORDER BY 1 DESC;

  frac_mcv  |    tablename     |      attname       | inherited | null_frac | n_distinct  | n_mcv | n_hist | correlation
------------+------------------+--------------------+-----------+-----------+-------------+-------+--------+-------------
 0.00666667 | LucrareBugetDate | LucrareBugetDateId | f         |         0 | 2.02985e+06 |   100 |    101 |   0.0631249

 =========================================================================== 

 SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation FROM pg_stats WHERE attname='LucrareBugetVersiuneId' AND tablename='LucrareBugetDate' ORDER BY 1 DESC;
 frac_mcv |    tablename     |        attname         | inherited | null_frac | n_distinct | n_mcv | n_hist | correlation
----------+------------------+------------------------+-----------+-----------+------------+-------+--------+-------------
        1 | LucrareBugetDate | LucrareBugetVersiuneId | f         |         0 |         22 |    22 |        |    0.624823
(1 row)
   ===========================================================================  

I think I went through most of https://wiki.postgresql.org/wiki/SlowQueryQuestions
I can provide more information if helpful. 

On Thu, 16 Jan 2020 at 17:11, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Cosmin Prund <cprund@xxxxxxxxx> writes:

> explain analyze

>    select R, C, V from LBD

>    where Ver = 92 and Id in (10,11)

> Index Scan using "IX_LBD_Ver_Id" on "LBD"  (cost=0.56..2.37 rows=1

> width=13) (actual time=0.063..857.725 rows=2 loops=1)

>   Index Cond: ("Ver" = 92)

>   Filter: ("Id" = ANY ('{10,11}'::integer[]))

>   Rows Removed by Filter: 1869178

> Planning time: 0.170 ms

> Execution time: 857.767 ms

> The  IX_LBD_Ver_Id index is on two columns (Ver, Id) - it's not in "Ver"

> alone!

Seems like an odd choice of plan, then, but you haven't provided any

detail that would let anyone guess why it's not using the second index

column.  For starters it would be good to show the exact table and

index schema (eg via \d+ in psql).  Also, does explicitly ANALYZE'ing

the table change anything?

https://wiki.postgresql.org/wiki/Slow_Query_Questions

                        regards, tom lane