Re: Please help! Query jumps from 1s -> 4m

James Thompson <james@xxxxxxxxxxxxx> · Mon, 4 May 2020 20:12:06 +0100

The change is abrupt, on the 10th execution (but I hadn't spotted it was always after the same number of executions until your suggestion - thanks for pointing me in that direction).

I don't see any custom configuration on our end that changes the threshold for this from 5->10. Debugging the query call I also see that PgConnection has the prepareThreshold set to 5.

Additionally, the execution plans for the 10th + following queries look fine, they have the same structure as if I run the query manually. It's not that the query plan switches, it seems as though the same query plan is just > 200X slower than usual.

As for the heap fetches -> as far as I can tell, on both occasions the fetches are relatively low and shouldn't account for minutes of execution (even if one is lower than the other). Looking through one days logs I do find cases with lower heap fetches too, for example as below which has 1977 fetches instead of the previous 6940 but took approx the same time:
->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias1  (cost=0.56..17.25 rows=1 width=60) (actual time=56.858..120893.874 rows=67000 loops=1)
        Index Cond: (col20 = $2005)
        Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text = ANY ((ARRAY[$1004, ..., $2003])::text[])))
        Rows Removed by Filter: 2662793
        Heap Fetches: 1977
        Buffers: shared hit=84574 read=3522

Would you agree the statement threshold / heap fetches seems unlikely to be causing this? Any other thoughts?

Thanks!

On Sun, 3 May 2020 at 16:38, Justin Pryzby <pryzby@xxxxxxxxxxxxx> wrote:
On Sun, May 03, 2020 at 09:58:27AM +0100, James Thompson wrote:

> Hi,

> 

> Hoping someone can help with this performance issue that's been driving a

> few of us crazy :-) Any guidance greatly appreciated.

> 

> A description of what you are trying to achieve and what results you

> expect.:

>  - I'd like to get an understanding of why the following query (presented

> in full, but there are specific parts that are confusing me) starts off

> taking ~second in duration but 'switches' to taking over 4 minutes.

Does it "switch" abruptly or do you get progressively slower queries ?

If it's abrupt following the 5th execution, I guess you're hitting this:

https://www.postgresql.org/message-id/Pine.BSO.4.64.0802131404090.6785@xxxxxxxxxxxxxxx

https://www.postgresql.org/message-id/A737B7A37273E048B164557ADEF4A58B50FB8D5E@xxxxxxxxxxxxxxxxxxxxxxxxxxxx

>  - we initially saw this behaviour for the exact same sql with a different

> index that resulted in an index scan. To try and fix the issue we've

> created an additional index with additional included fields so we now have

> Index Only Scans, but are still seeing the same problem.

>  Segments of interest:

>  1. ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on

> table1 table1alias1  (cost=0.56..17.25 rows=1 width=60) (actual

> time=110.539..123828.134 rows=67000 loops=1)

>         Index Cond: (col20 = $2005)

>         Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text = ANY

> ((ARRAY[$1004, ..., $2003])::text[])))

>         Rows Removed by Filter: 2662652

>         Heap Fetches: 6940

>         Buffers: shared hit=46619 read=42784 written=52

> If I run the same queries now:

> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1

> table1alias1  (cost=0.56..2549.69 rows=69 width=36)

> (actual time=1.017..1221.375 rows=67000 loops=1)

> Heap Fetches: 24

> Buffers: shared hit=2849 read=2483

It looks to me like you're getting good performance following a vacuum, when

Heap Fetches is low.  So you'd want to run vacuum more often, like:

| ALTER TABLE table1 SET (autovacuum_vacuum_scale_factor=0.005).

But maybe I've missed something - you showed the bad query plan, but not the

good one, and I wonder if they may be subtly different, and that's maybe masked

by the replaced identifiers.

-- 

Justin