Dear fellow DBAs,
I am seeking for some guidance with the following case that our developers have thrown at me and I apologize in advance for this lengthy mail ...
$> postgres --version
postgres (PostgreSQL) 13.6
We are dealing with the following issue:
select version, content from orderstore.order
WHERE jsonb_to_tsvector('english', content, '["all"]') @@ websearch_to_tsquery('english', '1.20709841') limit 10 ;
The Devs told me that this query normally finishes within a reasonable amount of time (<1sec) but every day - and all of a sudden - performance worsens to execution times > 20sec.
I am seeking for some guidance with the following case that our developers have thrown at me and I apologize in advance for this lengthy mail ...
$> postgres --version
postgres (PostgreSQL) 13.6
We are dealing with the following issue:
select version, content from orderstore.order
WHERE jsonb_to_tsvector('english', content, '["all"]') @@ websearch_to_tsquery('english', '1.20709841') limit 10 ;
The Devs told me that this query normally finishes within a reasonable amount of time (<1sec) but every day - and all of a sudden - performance worsens to execution times > 20sec.
Furthermore I was told:
"When we change the query to 'limit 100' it runs fast again"
"When we execute a 'vacuum orderstore.order' everything becomes good again - but that only lasts for a few hours"
So I scheduled a little script to be executed every minute which contains 3 explains.
1 query with limit 10
1 query with limit 100
1 query with the limit-clause omitted
And here's a quick grep of the result after a few hours:
...
Execution Time: 1.413 ms <= limit 10
Execution Time: 0.389 ms <= limit 100
Execution Time: 0.297 ms <= limit clause omitted
Execution Time: 1.456 ms
Execution Time: 0.396 ms
Execution Time: 0.302 ms
Execution Time: 1.412 ms
Execution Time: 0.428 ms
Execution Time: 0.255 ms
Execution Time: 1.404 ms
Execution Time: 0.403 ms
Execution Time: 0.258 ms
Execution Time: 25588.448 ms <= limit 10
Execution Time: 0.919 ms <= limit 100
Execution Time: 0.453 ms <= limit clause omitted
Execution Time: 25657.524 ms
Execution Time: 0.965 ms
Execution Time: 0.452 ms
Execution Time: 25843.139 ms
Execution Time: 0.959 ms
Execution Time: 0.446 ms
Execution Time: 25631.389 ms
Execution Time: 0.946 ms
Execution Time: 0.447 ms
Execution Time: 25452.764 ms
Execution Time: 0.937 ms
Execution Time: 0.444 ms
<= here I manually vacuumed the table
Execution Time: 0.071 ms
Execution Time: 0.021 ms
Execution Time: 0.015 ms
"When we change the query to 'limit 100' it runs fast again"
"When we execute a 'vacuum orderstore.order' everything becomes good again - but that only lasts for a few hours"
So I scheduled a little script to be executed every minute which contains 3 explains.
1 query with limit 10
1 query with limit 100
1 query with the limit-clause omitted
And here's a quick grep of the result after a few hours:
...
Execution Time: 1.413 ms <= limit 10
Execution Time: 0.389 ms <= limit 100
Execution Time: 0.297 ms <= limit clause omitted
Execution Time: 1.456 ms
Execution Time: 0.396 ms
Execution Time: 0.302 ms
Execution Time: 1.412 ms
Execution Time: 0.428 ms
Execution Time: 0.255 ms
Execution Time: 1.404 ms
Execution Time: 0.403 ms
Execution Time: 0.258 ms
Execution Time: 25588.448 ms <= limit 10
Execution Time: 0.919 ms <= limit 100
Execution Time: 0.453 ms <= limit clause omitted
Execution Time: 25657.524 ms
Execution Time: 0.965 ms
Execution Time: 0.452 ms
Execution Time: 25843.139 ms
Execution Time: 0.959 ms
Execution Time: 0.446 ms
Execution Time: 25631.389 ms
Execution Time: 0.946 ms
Execution Time: 0.447 ms
Execution Time: 25452.764 ms
Execution Time: 0.937 ms
Execution Time: 0.444 ms
<= here I manually vacuumed the table
Execution Time: 0.071 ms
Execution Time: 0.021 ms
Execution Time: 0.015 ms
Execution Time: 0.072 ms
Execution Time: 0.023 ms
Execution Time: 0.017 ms
Execution Time: 0.064 ms
Execution Time: 0.021 ms
Execution Time: 0.015 ms
Execution Time: 0.063 ms
Execution Time: 0.020 ms
Execution Time: 0.015 ms
...
Turned out the devs were right with their complaints.
The execution plan changed within one minute from using an index to a sequential scan;
Here are the details (In the "LOG:"-line I select the current timestamp and the row count of the table):
****************************
*** the last 'good' run: ***
****************************
LOG: | 2022-02-24 13:47:01.747416+01 | 9653
LIMIT 10:
Limit (cost=752.37..789.30 rows=10 width=22) (actual time=1.388..1.390 rows=1 loops=1)
-> Bitmap Heap Scan on "order" (cost=752.37..929.63 rows=48 width=22) (actual time=1.387..1.388 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..752.36 rows=48 width=0) (actual time=1.374..1.374 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.401 ms
Execution Time: 1.404 ms
LIMIT 100:
Limit (cost=752.37..929.63 rows=48 width=22) (actual time=0.391..0.391 rows=1 loops=1)
-> Bitmap Heap Scan on "order" (cost=752.37..929.63 rows=48 width=22) (actual time=0.390..0.391 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..752.36 rows=48 width=0) (actual time=0.387..0.387 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.136 ms
Execution Time: 0.403 ms
NO LIMIT:
Bitmap Heap Scan on "order" (cost=752.37..929.63 rows=48 width=22) (actual time=0.248..0.249 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..752.36 rows=48 width=0) (actual time=0.245..0.245 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.107 ms
Execution Time: 0.258 ms
*********************************************
*** the first 'bad' run (one minute later ***
*********************************************
LOG: | 2022-02-24 13:48:01.840362+01 | 9653
LIMIT 10:
Limit (cost=0.00..804.97 rows=10 width=22) (actual time=23970.845..25588.432 rows=1 loops=1)
-> Seq Scan on "order" (cost=0.00..3863.86 rows=48 width=22) (actual time=23970.843..25588.429 rows=1 loops=1)
Filter: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Rows Removed by Filter: 9652
Planning Time: 0.430 ms
Execution Time: 25588.448 ms
LIMIT 100:
Limit (cost=788.37..965.63 rows=48 width=22) (actual time=0.900..0.902 rows=1 loops=1)
-> Bitmap Heap Scan on "order" (cost=788.37..965.63 rows=48 width=22) (actual time=0.900..0.901 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..788.36 rows=48 width=0) (actual time=0.894..0.895 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.187 ms
Execution Time: 0.919 ms
NO LIMIT:
Bitmap Heap Scan on "order" (cost=788.37..965.63 rows=48 width=22) (actual time=0.442..0.442 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..788.36 rows=48 width=0) (actual time=0.438..0.438 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.151 ms
Execution Time: 0.453 ms
The table in question isn't that big:
oid | table_schema | table_name | row_estimate | total | index | toast | table
--------+--------------+------------+--------------+--------+--------+-------+-------
155544 | orderstore | order | 9649 | 210 MB | 108 MB | 91 MB | 10 MB
Table DDL:
CREATE TABLE orderstore."order" (
pk_id bigint DEFAULT nextval('orderstore.order_pk_id_seq'::regclass) NOT NULL,
version integer NOT NULL,
content jsonb NOT NULL,
manipulation_history jsonb NOT NULL,
CONSTRAINT chk_external_id_not_null CHECK (((content ->> 'externalId'::text) IS NOT NULL)),
CONSTRAINT chk_id_not_null CHECK (((content ->> 'id'::text) IS NOT NULL))
);
DDL of the index used (one amongst many others that exist):
--
-- Name: idx_fulltext_content; Type: INDEX; Schema: orderstore; Owner: orderstore
--
CREATE INDEX idx_fulltext_content ON orderstore."order" USING gin (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb));
The record in pg_stat_all_tables before the manual vacuum:
relid | 155544
schemaname | orderstore
relname | order
seq_scan | 249
seq_tup_read | 2209150
idx_scan | 24696
idx_tup_fetch | 1155483
n_tup_ins | 87
n_tup_upd | 1404
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 9653
n_dead_tup | 87
n_mod_since_analyze | 152
n_ins_since_vacuum | 4
last_vacuum | 2022-02-24 10:44:34.524241+01
last_autovacuum |
last_analyze | 2022-02-24 03:20:05.79219+01
last_autoanalyze |
vacuum_count | 3
autovacuum_count | 0
analyze_count | 8
autoanalyze_count | 0
The entry in pg_stat_all_tables after the manual vacuum:
relid | 155544
schemaname | orderstore
relname | order
seq_scan | 249
seq_tup_read | 2209150
idx_scan | 24753
idx_tup_fetch | 1155561
n_tup_ins | 87
n_tup_upd | 1404
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 9476
n_dead_tup | 0
n_mod_since_analyze | 152
n_ins_since_vacuum | 0
last_vacuum | 2022-02-24 14:32:16.083692+01
last_autovacuum |
last_analyze | 2022-02-24 03:20:05.79219+01
last_autoanalyze |
vacuum_count | 4
autovacuum_count | 0
analyze_count | 8
autoanalyze_count | 0
Can someone provide any hints on how to deal with this issue? What am I missing?
In case you need additional informations pls let me know.
kind regards,
peter
Execution Time: 0.023 ms
Execution Time: 0.017 ms
Execution Time: 0.064 ms
Execution Time: 0.021 ms
Execution Time: 0.015 ms
Execution Time: 0.063 ms
Execution Time: 0.020 ms
Execution Time: 0.015 ms
...
Turned out the devs were right with their complaints.
The execution plan changed within one minute from using an index to a sequential scan;
Here are the details (In the "LOG:"-line I select the current timestamp and the row count of the table):
****************************
*** the last 'good' run: ***
****************************
LOG: | 2022-02-24 13:47:01.747416+01 | 9653
LIMIT 10:
Limit (cost=752.37..789.30 rows=10 width=22) (actual time=1.388..1.390 rows=1 loops=1)
-> Bitmap Heap Scan on "order" (cost=752.37..929.63 rows=48 width=22) (actual time=1.387..1.388 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..752.36 rows=48 width=0) (actual time=1.374..1.374 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.401 ms
Execution Time: 1.404 ms
LIMIT 100:
Limit (cost=752.37..929.63 rows=48 width=22) (actual time=0.391..0.391 rows=1 loops=1)
-> Bitmap Heap Scan on "order" (cost=752.37..929.63 rows=48 width=22) (actual time=0.390..0.391 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..752.36 rows=48 width=0) (actual time=0.387..0.387 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.136 ms
Execution Time: 0.403 ms
NO LIMIT:
Bitmap Heap Scan on "order" (cost=752.37..929.63 rows=48 width=22) (actual time=0.248..0.249 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..752.36 rows=48 width=0) (actual time=0.245..0.245 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.107 ms
Execution Time: 0.258 ms
*********************************************
*** the first 'bad' run (one minute later ***
*********************************************
LOG: | 2022-02-24 13:48:01.840362+01 | 9653
LIMIT 10:
Limit (cost=0.00..804.97 rows=10 width=22) (actual time=23970.845..25588.432 rows=1 loops=1)
-> Seq Scan on "order" (cost=0.00..3863.86 rows=48 width=22) (actual time=23970.843..25588.429 rows=1 loops=1)
Filter: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Rows Removed by Filter: 9652
Planning Time: 0.430 ms
Execution Time: 25588.448 ms
LIMIT 100:
Limit (cost=788.37..965.63 rows=48 width=22) (actual time=0.900..0.902 rows=1 loops=1)
-> Bitmap Heap Scan on "order" (cost=788.37..965.63 rows=48 width=22) (actual time=0.900..0.901 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..788.36 rows=48 width=0) (actual time=0.894..0.895 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.187 ms
Execution Time: 0.919 ms
NO LIMIT:
Bitmap Heap Scan on "order" (cost=788.37..965.63 rows=48 width=22) (actual time=0.442..0.442 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..788.36 rows=48 width=0) (actual time=0.438..0.438 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.151 ms
Execution Time: 0.453 ms
The table in question isn't that big:
oid | table_schema | table_name | row_estimate | total | index | toast | table
--------+--------------+------------+--------------+--------+--------+-------+-------
155544 | orderstore | order | 9649 | 210 MB | 108 MB | 91 MB | 10 MB
Table DDL:
CREATE TABLE orderstore."order" (
pk_id bigint DEFAULT nextval('orderstore.order_pk_id_seq'::regclass) NOT NULL,
version integer NOT NULL,
content jsonb NOT NULL,
manipulation_history jsonb NOT NULL,
CONSTRAINT chk_external_id_not_null CHECK (((content ->> 'externalId'::text) IS NOT NULL)),
CONSTRAINT chk_id_not_null CHECK (((content ->> 'id'::text) IS NOT NULL))
);
DDL of the index used (one amongst many others that exist):
--
-- Name: idx_fulltext_content; Type: INDEX; Schema: orderstore; Owner: orderstore
--
CREATE INDEX idx_fulltext_content ON orderstore."order" USING gin (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb));
The record in pg_stat_all_tables before the manual vacuum:
relid | 155544
schemaname | orderstore
relname | order
seq_scan | 249
seq_tup_read | 2209150
idx_scan | 24696
idx_tup_fetch | 1155483
n_tup_ins | 87
n_tup_upd | 1404
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 9653
n_dead_tup | 87
n_mod_since_analyze | 152
n_ins_since_vacuum | 4
last_vacuum | 2022-02-24 10:44:34.524241+01
last_autovacuum |
last_analyze | 2022-02-24 03:20:05.79219+01
last_autoanalyze |
vacuum_count | 3
autovacuum_count | 0
analyze_count | 8
autoanalyze_count | 0
The entry in pg_stat_all_tables after the manual vacuum:
relid | 155544
schemaname | orderstore
relname | order
seq_scan | 249
seq_tup_read | 2209150
idx_scan | 24753
idx_tup_fetch | 1155561
n_tup_ins | 87
n_tup_upd | 1404
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 9476
n_dead_tup | 0
n_mod_since_analyze | 152
n_ins_since_vacuum | 0
last_vacuum | 2022-02-24 14:32:16.083692+01
last_autovacuum |
last_analyze | 2022-02-24 03:20:05.79219+01
last_autoanalyze |
vacuum_count | 4
autovacuum_count | 0
analyze_count | 8
autoanalyze_count | 0
Can someone provide any hints on how to deal with this issue? What am I missing?
In case you need additional informations pls let me know.
kind regards,
peter