Advice needed: query performance deteriorates by 2000% within 1 minute

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear fellow DBAs,

I am seeking for some guidance with the following case that our developers have thrown at me 
and I apologize in advance for this lengthy mail ... 

$> postgres --version
postgres (PostgreSQL) 13.6


We are dealing with the following issue:


select version, content from orderstore.order
WHERE jsonb_to_tsvector('english', content, '["all"]') @@ websearch_to_tsquery('english', '1.20709841') limit 10 ;


The Devs told me that this query normally finishes within a reasonable amount of time (<1sec) but every day - and all of a sudden - performance worsens to execution times > 20sec.

Furthermore I was told:

"When we change the query to 'limit 100' it runs fast again"
"When we execute a 'vacuum orderstore.order' everything becomes good again - but that only lasts for a few hours"

So I scheduled a little script to be executed every minute which contains 3 explains.

1 query with limit 10
1 query with limit 100
1 query with the limit-clause omitted

And here's a quick grep of the result after a few hours:

 ...

 Execution Time: 1.413 ms       <= limit 10
 Execution Time: 0.389 ms       <= limit 100
 Execution Time: 0.297 ms       <= limit clause omitted
 Execution Time: 1.456 ms
 Execution Time: 0.396 ms
 Execution Time: 0.302 ms
 Execution Time: 1.412 ms
 Execution Time: 0.428 ms
 Execution Time: 0.255 ms
 Execution Time: 1.404 ms
 Execution Time: 0.403 ms
 Execution Time: 0.258 ms
 Execution Time: 25588.448 ms    <= limit 10
 Execution Time: 0.919 ms        <= limit 100
 Execution Time: 0.453 ms        <= limit clause omitted
 Execution Time: 25657.524 ms
 Execution Time: 0.965 ms
 Execution Time: 0.452 ms
 Execution Time: 25843.139 ms
 Execution Time: 0.959 ms
 Execution Time: 0.446 ms
 Execution Time: 25631.389 ms
 Execution Time: 0.946 ms
 Execution Time: 0.447 ms
 Execution Time: 25452.764 ms
 Execution Time: 0.937 ms
 Execution Time: 0.444 ms
                                  <= here I manually vacuumed the table
 Execution Time: 0.071 ms
 Execution Time: 0.021 ms
 Execution Time: 0.015 ms
 Execution Time: 0.072 ms
 Execution Time: 0.023 ms
 Execution Time: 0.017 ms
 Execution Time: 0.064 ms
 Execution Time: 0.021 ms
 Execution Time: 0.015 ms
 Execution Time: 0.063 ms
 Execution Time: 0.020 ms
 Execution Time: 0.015 ms

                                 
 ...


Turned out the devs were right with their complaints.

The execution plan changed within one minute from using an index to a sequential scan;

Here are the details (In the "LOG:"-line I select the current timestamp and the row count of the table):

****************************
*** the last 'good' run: ***
****************************

 LOG:     | 2022-02-24 13:47:01.747416+01 |  9653

 LIMIT 10:

 Limit  (cost=752.37..789.30 rows=10 width=22) (actual time=1.388..1.390 rows=1 loops=1)
   ->  Bitmap Heap Scan on "order"  (cost=752.37..929.63 rows=48 width=22) (actual time=1.387..1.388 rows=1 loops=1)
         Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on idx_fulltext_content  (cost=0.00..752.36 rows=48 width=0) (actual time=1.374..1.374 rows=1 loops=1)
               Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
 Planning Time: 0.401 ms
 Execution Time: 1.404 ms

 LIMIT 100:

 Limit  (cost=752.37..929.63 rows=48 width=22) (actual time=0.391..0.391 rows=1 loops=1)
   ->  Bitmap Heap Scan on "order"  (cost=752.37..929.63 rows=48 width=22) (actual time=0.390..0.391 rows=1 loops=1)
         Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on idx_fulltext_content  (cost=0.00..752.36 rows=48 width=0) (actual time=0.387..0.387 rows=1 loops=1)
               Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
 Planning Time: 0.136 ms
 Execution Time: 0.403 ms

 NO LIMIT:

 Bitmap Heap Scan on "order"  (cost=752.37..929.63 rows=48 width=22) (actual time=0.248..0.249 rows=1 loops=1)
   Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on idx_fulltext_content  (cost=0.00..752.36 rows=48 width=0) (actual time=0.245..0.245 rows=1 loops=1)
         Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
 Planning Time: 0.107 ms
 Execution Time: 0.258 ms


*********************************************
*** the first 'bad' run (one minute later ***
*********************************************

 LOG:     | 2022-02-24 13:48:01.840362+01 |  9653

 LIMIT 10:

 Limit  (cost=0.00..804.97 rows=10 width=22) (actual time=23970.845..25588.432 rows=1 loops=1)
   ->  Seq Scan on "order"  (cost=0.00..3863.86 rows=48 width=22) (actual time=23970.843..25588.429 rows=1 loops=1)
         Filter: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
         Rows Removed by Filter: 9652
 Planning Time: 0.430 ms
 Execution Time: 25588.448 ms

 LIMIT 100:

 Limit  (cost=788.37..965.63 rows=48 width=22) (actual time=0.900..0.902 rows=1 loops=1)
   ->  Bitmap Heap Scan on "order"  (cost=788.37..965.63 rows=48 width=22) (actual time=0.900..0.901 rows=1 loops=1)
         Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on idx_fulltext_content  (cost=0.00..788.36 rows=48 width=0) (actual time=0.894..0.895 rows=1 loops=1)
               Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
 Planning Time: 0.187 ms
 Execution Time: 0.919 ms

 NO LIMIT:

 Bitmap Heap Scan on "order"  (cost=788.37..965.63 rows=48 width=22) (actual time=0.442..0.442 rows=1 loops=1)
   Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on idx_fulltext_content  (cost=0.00..788.36 rows=48 width=0) (actual time=0.438..0.438 rows=1 loops=1)
         Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
 Planning Time: 0.151 ms
 Execution Time: 0.453 ms




The table in question isn't that big:

  oid   | table_schema | table_name | row_estimate | total  | index  | toast | table
--------+--------------+------------+--------------+--------+--------+-------+-------
 155544 | orderstore   | order      |         9649 | 210 MB | 108 MB | 91 MB | 10 MB



Table DDL:

CREATE TABLE orderstore."order" (
    pk_id bigint DEFAULT nextval('orderstore.order_pk_id_seq'::regclass) NOT NULL,
    version integer NOT NULL,
    content jsonb NOT NULL,
    manipulation_history jsonb NOT NULL,
    CONSTRAINT chk_external_id_not_null CHECK (((content ->> 'externalId'::text) IS NOT NULL)),
    CONSTRAINT chk_id_not_null CHECK (((content ->> 'id'::text) IS NOT NULL))
);

DDL of the index used (one amongst many others that exist):

--
-- Name: idx_fulltext_content; Type: INDEX; Schema: orderstore; Owner: orderstore
--

CREATE INDEX idx_fulltext_content ON orderstore."order" USING gin (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb));


The record in pg_stat_all_tables before the manual vacuum:

relid               | 155544
schemaname          | orderstore
relname             | order
seq_scan            | 249
seq_tup_read        | 2209150
idx_scan            | 24696
idx_tup_fetch       | 1155483
n_tup_ins           | 87
n_tup_upd           | 1404
n_tup_del           | 0
n_tup_hot_upd       | 0
n_live_tup          | 9653
n_dead_tup          | 87
n_mod_since_analyze | 152
n_ins_since_vacuum  | 4
last_vacuum         | 2022-02-24 10:44:34.524241+01
last_autovacuum     |
last_analyze        | 2022-02-24 03:20:05.79219+01
last_autoanalyze    |
vacuum_count        | 3
autovacuum_count    | 0
analyze_count       | 8
autoanalyze_count   | 0

The entry in pg_stat_all_tables after the manual vacuum:

relid               | 155544
schemaname          | orderstore
relname             | order
seq_scan            | 249
seq_tup_read        | 2209150
idx_scan            | 24753
idx_tup_fetch       | 1155561
n_tup_ins           | 87
n_tup_upd           | 1404
n_tup_del           | 0
n_tup_hot_upd       | 0
n_live_tup          | 9476
n_dead_tup          | 0
n_mod_since_analyze | 152
n_ins_since_vacuum  | 0
last_vacuum         | 2022-02-24 14:32:16.083692+01
last_autovacuum     |
last_analyze        | 2022-02-24 03:20:05.79219+01
last_autoanalyze    |
vacuum_count        | 4
autovacuum_count    | 0
analyze_count       | 8
autoanalyze_count   | 0


Can someone provide any hints on how to deal with this issue? What am I missing?

In case you need additional informations pls let me know.


kind regards,

peter 


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux