Hi Postgres community, We seem to have found a situation where a query run using explain analyse or pgbench is incredibly fast, but run via Java under load performs very poorly – we’ve checked query performance metrics for our Postgres
instance and can confirm that it’s the query running slowly, and not a Java problem. The table has ~7 million rows, with total size of 44GB. As shown below, explain analyse gives a sub 1ms execution time, yet in practice the query takes an average of 2 seconds – it’s worth mentioning that
a small percentage of requests do take ~1ms, so something non-deterministic is happening here. The query in question is below, with explain analyse: explain analyse SELECT coalesce ( (SELECT min(msg_offset) - 1 FROM events WHERE created_utc >= CURRENT_TIMESTAMP - INTERVAL '60 SECONDS' ), (SELECT max(msg_offset) FROM events), 0 );
QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- - Result
(cost=2.03..2.04 rows=1 width=8) (actual time=0.183..0.183 rows=1 loops=1) InitPlan 1 (returns $1) ->
Aggregate
(cost=1.56..1.57 rows=1 width=8) (actual time=0.028..0.028 rows=1 loops=1) ->
Index Only Scan using test_4 on events
(cost=0.44..1.56 rows=1 width=8) (actual time=0.023..0.023 rows=0 loops=1) Index Cond: (created_utc >= (now() - '00:01:00'::interval)) Heap Fetches: 0 InitPlan 3 (returns $3) ->
Result
(cost=0.45..0.46 rows=1 width=8) (actual time=0.153..0.153 rows=1 loops=1) InitPlan 2 (returns $2) ->
Limit
(cost=0.43..0.45 rows=1 width=8) (actual time=0.152..0.152 rows=1 loops=1) ->
Index Only Scan Backward using events_pkey on events events_1
(cost=0.43..156842.57 rows=7643099 width=8) (actual time=0.151..0.151 rows=1 loops=1)
Index Cond: (msg_offset IS NOT NULL)
Heap Fetches: 0 Planning time: 2.480 ms Execution time: 0.242 ms (15 rows) Output of \d events, with a few test indexes added to try and improve performance:
Table "public.events" Column
|
Type
| Collation | Nullable |
Default --------------+-----------------------------+-----------+----------+-------------------------------------------- msg_offset
| bigint
|
| not null | nextval('events_msg_offset_seq'::regclass) msg_key
| character varying
|
| not null | content_type | character varying
|
| not null | type
| character varying
|
| not null | created_utc
| timestamp without time zone |
| not null | data
| text
|
| | event_size
| integer
|
| not null | 1048576 cluster_id
| bigint
|
| not null | 1 Indexes: "events_pkey" PRIMARY KEY, btree (msg_offset) "cluster_idx" btree (msg_key, cluster_id) "cluster_type_filter_idx" btree (type, cluster_id) "created_idx" btree (created_utc, msg_key) "key_idx" btree (msg_key) "test_1" btree (created_utc, msg_offset) "test_2" btree (msg_offset, created_utc) "test_3" btree (created_utc, msg_offset) "test_4" btree (created_utc, msg_offset DESC) "type_idx" btree (type) Any help understanding what is happening here, or guidance to improve performance would be much appreciated. Regards, Adam |