Re: Advice needed: query performance deteriorates by 2000% within 1 minute

Peter Adlersburg <peter.adlersburg@xxxxxxxxx> · Tue, 1 Mar 2022 08:29:04 +0100

Hello, 
Michael, Tom: thanks for all the insights and informations in your previous mails.

A quick update of the explain outputs (this time using explain (analyze, buffers, verbose)) 

The good: 

LOG Time: | 2022-02-28 09:30:01.400777+01 | order rows:
|  9668

 Limit 
(cost=616.37..653.30 rows=10 width=22) (actual time=1.062..1.063 rows=1
loops=1)

Output: version, content

Buffers: shared hit=154

->  Bitmap Heap Scan on orderstore."order" 
(cost=616.37..793.63 rows=48 width=22) (actual time=1.061..1.062 rows=1
loops=1)

Output: version, content

Recheck Cond: (jsonb_to_tsvector('english'::regconfig,
"order".content, '["all"]'::jsonb) @@
'''1.20709841'''::tsquery)

Heap Blocks: exact=1

Buffers: shared hit=154

->  Bitmap Index Scan on idx_fulltext_content  (cost=0.00..616.36
rows=48 width=0) (actual time=1.053..1.053 rows=1 loops=1)

Index Cond: (jsonb_to_tsvector('english'::regconfig, "order".content,
'["all"]'::jsonb) @@ '''1.20709841'''::tsquery)

Buffers: shared hit=153

Planning:

Buffers: shared hit=50

Planning
Time: 0.408 ms

Execution Time: 1.079 ms

pg_stat_all_tables: 

n_tup_ins         
 | 102
n_tup_upd          
| 1554
n_tup_del          
| 0
n_tup_hot_upd      
| 0
n_live_tup         
| 9668
n_dead_tup          |
69
n_mod_since_analyze
| 61
n_ins_since_vacuum 
| 8
last_vacuum        
| 2022-02-25 07:54:46.196508+01
last_autovacuum    
|
last_analyze       
| 2022-02-28 03:20:38.761482+01

last_autoanalyze    |

The bad: 

LOG
Time: | 2022-02-28 09:45:01.662702+01 | order rows: |  9668

LIMIT
10:

Limit 
(cost=0.00..805.63 rows=10 width=22) (actual time=24175.964..25829.767 rows=1
loops=1)

Output: version, content

Buffers: shared hit=26284 read=12550 dirtied=4

->  Seq Scan on orderstore."order"  (cost=0.00..3867.01
rows=48 width=22) (actual time=24175.962..25829.763 rows=1 loops=1)

Output: version, content

Filter: (jsonb_to_tsvector('english'::regconfig, "order".content,
'["all"]'::jsonb) @@ '''1.20709841'''::tsquery)

Rows Removed by Filter: 9667

Buffers: shared hit=26284 read=12550 dirtied=4

Planning:

Buffers: shared hit=50

Planning
Time: 0.377 ms

Execution Time: 25829.778 ms

pg_stat_all_tables:
n_tup_ins          
| 102

n_tup_upd          
| 1585

n_tup_del          
| 0

n_tup_hot_upd      
| 0

n_live_tup         
| 9668

n_dead_tup          |
100

n_mod_since_analyze
| 92

n_ins_since_vacuum 
| 8

last_vacuum        
| 2022-02-25 07:54:46.196508+01

last_autovacuum    
|

last_analyze       
| 2022-02-28 03:20:38.761482+01

last_autoanalyze   
|

The ugly: 

It should be mentioned that the table in question mainly lives in toast land (but I have no idea if this also influences the query planner):

    oid   | table_schema | table_name | row_estimate | total_bytes | index_bytes | toast_bytes | table_bytes | total  | index  | toast | table
--------+--------------+------------+--------------+-------------+-------------+-------------+-------------+--------+--------+-------+-------
 155544 | orderstore   | order      |         9570 |   229826560 |   120184832 |    98557952 |    11083776 | 219 MB | 115 MB | 94 MB | 11 MB

Since tinkering with the text search functions is out of the question we came up with three possibilities on how to deal with this issue: 

- significantly increase the limit clause or omit it at all (meh ...) 
- use 'set random_page_cost = 0.5'  in the transaction in order to convince the query planner to prefer the index (tested and works)
- schedule an hourly vacuum job for  the table (the most likely solution we will settle on since it comes with the least implementation effort)

None of these seems very elegant or viable in the long run ... we'll see.

Ah, yes: our global settings for random_page_cost and autovacuum/analyze are set to the defaults.

 Will json-processing experience some improvements in pg14/15? We are about to update to 14 in the near future with our devs saying that this topic is the main trigger to do so. 

Any further thoughts on the case are very much appreciated. 

kr p. 

Am Do., 24. Feb. 2022 um 17:10 Uhr schrieb Tom Lane <tgl@xxxxxxxxxxxxx>:
Peter Adlersburg <peter.adlersburg@xxxxxxxxx> writes:

>  Limit  (cost=0.00..804.97 rows=10 width=22) (actual

> time=23970.845..25588.432 rows=1 loops=1)

>    ->  Seq Scan on "order"  (cost=0.00..3863.86 rows=48 width=22) (actual

> time=23970.843..25588.429 rows=1 loops=1)

>          Filter: (jsonb_to_tsvector('english'::regconfig, content,

> '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)

>          Rows Removed by Filter: 9652

>  Planning Time: 0.430 ms

>  Execution Time: 25588.448 ms

I think the expense here comes from re-executing jsonb_to_tsvector

a lot of times.  By default that's estimated as 100 times more expensive

than a simple function (such as addition), but these results make it

seem like that's an understatement.  You might try something like

alter function jsonb_to_tsvector(regconfig, jsonb, jsonb) cost 1000;

to further discourage the planner from picking this plan shape.

Possibly the cost estimate for ts_match_tq (the function underlying

this variant of @@) needs to be bumped up as well.

(Bear in mind that pg_dump will not propagate such hacks on

system-defined objects, so you'll need a note to reapply

any such changes after dump/reload or pg_upgrade.)

                        regards, tom lane