GIN JSONB path index is not always used

Tomasz Szymański <lime129@xxxxxxxxx> · Tue, 17 Oct 2023 15:48:41 +0200

Hello! We have an issue with database planner choosing really expensive sequence scan instead of an index scan in some cases.
I'm reaching out in order to maybe get some idea on what we're dealing with / what could be the actual issue here.
The table affected is the users table with a field called "private_metadata" which is JSONB type.
There are currently two GIN indexes on that table - one GIN and second which is GIN jsonb_path_ops.
The operator that is used for the problematic query is contains (@>) so jsonb_path_ops should be preferred according to docs https://www.postgresql.org/docs/11/gin-builtin-opclasses.html.
However the database sometimes decides NOT to use the index and perform the sequence scan.
I have observed it happens in intervals - ex. 10 hours everything is ok, then something snaps and there are sequence scans for few more hours, then back to index and so on.
More context:
- Database version: 11.18
- The table schema with indexes
CREATE TABLE account_user (
    id integer NOT NULL,
    private_metadata jsonb,
);
CREATE INDEX user_p_meta_idx ON account_user USING gin (private_metadata);
CREATE INDEX user_p_meta_jsonb_path_idx ON account_user USING gin (private_metadata jsonb_path_ops);
- The query that is perfomed
SELECT "account_user"."id", "account_user"."private_metadata"
FROM "account_user"
WHERE "account_user"."private_metadata" @ > '{"somekey": "somevalue"}'
LIMIT 21;
- Plan when it uses an index
{
  "Plan": {
    "Node Type": "Limit",
    "Parallel Aware": false,
    "Startup Cost": 1091.29,
    "Total Cost": 1165.26,
    "Plan Rows": 21,
    "Plan Width": 4,
    "Plans": [
      {
        "Node Type": "Bitmap Heap Scan",
        "Parent Relationship": "Outer",
        "Parallel Aware": false,
        "Relation Name": "account_user",
        "Alias": "account_user",
        "Startup Cost": 1091.29,
        "Total Cost": 17129.12,
        "Plan Rows": 4553,
        "Plan Width": 4,
        "Recheck Cond": "( private_metadata @ > ? :: jsonb )",
        "Plans": [
          {
            "Node Type": "Bitmap Index Scan",
            "Parent Relationship": "Outer",
            "Parallel Aware": false,
            "Index Name": "user_p_meta_jsonb_path_idx",
            "Startup Cost": 0,
            "Total Cost": 1090.15,
            "Plan Rows": 4553,
            "Plan Width": 0,
            "Index Cond": "( private_metadata @ > ? :: jsonb )"
          }
        ]
      }
    ]
  }
}
- Plan when it doesn't use an index
{
  "Plan": {
    "Node Type": "Limit",
    "Parallel Aware": false,
    "Startup Cost": 0,
    "Total Cost": 1184.3,
    "Plan Rows": 21,
    "Plan Width": 4,
    "Plans": [
      {
        "Node Type": "Seq Scan",
        "Parent Relationship": "Outer",
        "Parallel Aware": false,
        "Relation Name": "account_user",
        "Alias": "account_user",
        "Startup Cost": 0,
        "Total Cost": 256768.27,
        "Plan Rows": 4553,
        "Plan Width": 4,
        "Filter": "( private_metadata @ > ? :: jsonb )"
      }
    ]
  }
}
- There are ~4.5M rows on the table
- We currently have a script that is heavily inserting rows 24/7, about 130k rows / day (1,5 row/s)
- It seems maybe the index can't keep up(?) because of this heavy insertion
SELECT * FROM pgstatginindex('user_p_meta_jsonb_path_idx');
 version | pending_pages | pending_tuples
---------+---------------+----------------
       2 |            98 |          28807
(1 row)
Might it be the case that is cloggs up and cannot use the index when reading?
- Last autovacuum for some reason happened 4 days ago
select * from pg_stat_user_tables where relname = 'account_user';
-[ RECORD 1
]-------+-----------------------------
relid               | 74937
schemaname          | public
relname             | account_user
seq_scan            | 66578
seq_tup_read        | 99542628744
idx_scan            | 9342647
idx_tup_fetch       | 105527685
n_tup_ins           | 518901
n_tup_upd           | 684607
n_tup_del           | 25
n_tup_hot_upd       | 82803
n_live_tup          | 4591412
n_dead_tup          | 370828
n_mod_since_analyze | 152968
last_vacuum         |
last_autovacuum     | 2023-10-13 07: 35: 29.11448+00
last_analyze        |
last_autoanalyze    | 2023-10-13 07: 44: 31.90437+00
vacuum_count        | 0
autovacuum_count    | 2
analyze_count       | 0
- The plans above comes from a production system, I failed to reproduce it locally, only two cases in which I managed to get a seq scan were:
a) when there are very few rows in the table
b) when I run the filter @> '{}' in which case I suppose postgres deduces "you want everything" so additional index lookup in not necessary (checked and this filter is impossible to induce by our code).

Any feedback is appreciated! What would be the possible reason the planner chooses seq scan in that case?