Transactions to table, ChangeHistory, have recently become intermittently slow and is increasing becoming slower.
* No database configuration changes have been made recently
* We have run vacuum analyze
* We have tried backing up and reloading the table (data, indexes, etc)
Some transactions respond quickly (200 ms) and others take over 55 seconds (we cancel the query after 55 seconds – our timeout SLA). The problem has recently become very bad. It is the same query being issued but with different parameters.
If the transaction takes over 55 seconds and I run the query manually (with or without EXPLAIN ANALYZE) it returns quickly (a few hundred ms). In case I am looking at cache, I have a list of other queries (just different parameters) that have timed out
and when I run them (without the limit even) the response is very timely.
Any help or insight would be great.
NOTE: our application is connecting to the database via JDBC and we are using PreparedStatements. I have provided full details so all information is available, but please let me know if any other information is needed – thx in advance.
p306=> EXPLAIN ANALYZE SELECT * FROM ChangeHistory WHERE Category BETWEEN 'Employee' AND 'Employeezz' AND PrimaryKeyOfChange BETWEEN '312313' AND '312313!zz' ORDER BY ChgTS DESC, ChgUser DESC, Category DESC, PrimaryKeyOfChange DESC LIMIT 11;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Limit (cost=33.66..33.67 rows=1 width=136) (actual time=0.297..0.297 rows=11 loops=1)
-> Sort (cost=33.66..33.67 rows=1 width=136) (actual time=0.297..0.297 rows=11 loops=1)
Sort Key: chgts, chguser, category, primarykeyofchange
Sort Method: top-N heapsort Memory: 27kB
-> Index Scan using changehistory_idx4 on changehistory (cost=0.00..33.65 rows=1 width=136) (actual time=0.046..
0.239 rows=85 loops=1)
Index Cond: (((primarykeyofchange)::text >= '312313'::text) AND ((primarykeyofchange)::text <= '312313!zz'::
text))
Filter: (((category)::text >= 'Employee'::text) AND ((category)::text <= 'Employeezz'::text))
Total runtime: 0.328 ms
(8 rows)
>>>
History this week of counts with good response times vs timeouts.
| Date | Success # | Time Out # | Avg. Success Secs |
|------------+-----------+------------+-------------------|
| 2016-05-09 | 18 | 31 | 7.9 |
| 2016-05-10 | 17 | 25 | 10.5 |
| 2016-05-11 | 27 | 33 | 10.1 |
| 2016-05-12 | 68 | 24 | 9.9 |
>>> Sample transaction response times
| Timestamp | Tran ID | Resp MS | Resp CD
--------------------+----------------+---------+--------
2016-05-10 06:20:19 | ListChangeHist | 55,023 | TIMEOUT
2016-05-10 07:47:34 | ListChangeHist | 55,017 | TIMEOUT
2016-05-10 07:48:00 | ListChangeHist | 9,866 | OK
2016-05-10 07:48:10 | ListChangeHist | 2,327 | OK
2016-05-10 07:59:23 | ListChangeHist | 55,020 | TIMEOUT
2016-05-10 08:11:20 | ListChangeHist | 55,030 | TIMEOUT
2016-05-10 08:31:45 | ListChangeHist | 4,216 | OK
2016-05-10 08:35:09 | ListChangeHist | 7,898 | OK
2016-05-10 08:36:18 | ListChangeHist | 9,810 | OK
2016-05-10 08:36:56 | ListChangeHist | 55,027 | TIMEOUT
2016-05-10 08:37:33 | ListChangeHist | 46,433 | OK
2016-05-10 08:38:09 | ListChangeHist | 55,019 | TIMEOUT
2016-05-10 08:53:43 | ListChangeHist | 55,019 | TIMEOUT
2016-05-10 09:45:09 | ListChangeHist | 55,022 | TIMEOUT
2016-05-10 09:46:13 | ListChangeHist | 55,017 | TIMEOUT
2016-05-10 09:49:27 | ListChangeHist | 55,011 | TIMEOUT
2016-05-10 09:52:12 | ListChangeHist | 55,018 | TIMEOUT
2016-05-10 09:57:42 | ListChangeHist | 9,462 | OK
2016-05-10 10:05:21 | ListChangeHist | 55,016 | TIMEOUT
2016-05-10 10:05:29 | ListChangeHist | 136 | OK
2016-05-10 10:05:38 | ListChangeHist | 1,517 | OK
Artifacts
======================
$ >uname -a
SunOS ***** 5.10 Generic_150400-30 sun4v sparc sun4v
Memory : 254G phys mem, 207G free mem.
Processors: 32 - CPU is mostly 80% free
>>>
p306=> select version();
version
---------------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on sparc-sun-solaris2.10, compiled by gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath), 64-bit
>>>
p306=> \dt+ changehistory
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------------+-------+-------+-------+-------------
public | changehistory | table | p306 | 17 GB |
>>>
p306=> \di+ changehistory*
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------------------+-------+-------+---------------+---------+-------------
public | changehistory_idx1 | index | p306 | changehistory | 9597 MB |
public | changehistory_idx3 | index | p306 | changehistory | 11 GB |
public | changehistory_idx4 | index | p306 | changehistory | 4973 MB |
public | changehistory_pkey | index | p306 | changehistory | 2791 MB |
public | changehistory_search2 | index | p306 | changehistory | 9888 MB |
public | changehistory_search3 | index | p306 | changehistory | 10 GB |
public | changehistory_search4 | index | p306 | changehistory | 9240 MB |
public | changehistory_search5 | index | p306 | changehistory | 8373 MB |
(8 rows)
>>>
p306=> select count(*) from changehistory ;
count
------------
129,185,024
>>>
Show all (filtered)
======================================================
name | setting
---------------------------------+--------------------
autovacuum | on
autovacuum_analyze_scale_factor | 0.001
autovacuum_analyze_threshold | 500
autovacuum_freeze_max_age | 200000000
autovacuum_max_workers | 5
autovacuum_naptime | 1min
autovacuum_vacuum_cost_delay | 0
autovacuum_vacuum_cost_limit | -1
autovacuum_vacuum_scale_factor | 0.001
autovacuum_vacuum_threshold | 500
bgwriter_delay | 200ms
block_size | 8192
check_function_bodies | on
checkpoint_completion_target | 0.9
checkpoint_segments | 256
checkpoint_timeout | 1h
checkpoint_warning | 30s
client_encoding | UTF8
commit_delay | 0
commit_siblings | 5
cpu_index_tuple_cost | 0.005
cpu_operator_cost | 0.0025
cpu_tuple_cost | 0.01
cursor_tuple_fraction | 0.1
deadlock_timeout | 1s
default_statistics_target | 100
default_transaction_deferrable | off
default_transaction_isolation | read committed
default_transaction_read_only | off
default_with_oids | off
effective_cache_size | 8GB
from_collapse_limit | 8
fsync | on
full_page_writes | on
ignore_system_indexes | off
join_collapse_limit | 8
krb_caseins_users | off
lo_compat_privileges | off
maintenance_work_mem | 1GB
max_connections | 350
max_files_per_process | 1000
max_function_args | 100
max_identifier_length | 63
max_index_keys | 32
max_locks_per_transaction | 64
max_pred_locks_per_transaction | 64
max_prepared_transactions | 0
max_stack_depth | 2MB
max_wal_senders | 5
random_page_cost | 4
segment_size | 1GB
seq_page_cost | 1
server_encoding | UTF8
server_version | 9.1.14
shared_buffers | 2GB
sql_inheritance | on
statement_timeout | 0
synchronize_seqscans | on
synchronous_commit | on
synchronous_standby_names |
tcp_keepalives_count | 0
tcp_keepalives_idle | -1
tcp_keepalives_interval | 0
track_activities | on
track_activity_query_size | 1024
track_counts | on
track_functions | none
transaction_deferrable | off
transaction_isolation | read committed
transaction_read_only | off
transform_null_equals | off
update_process_title | on
vacuum_cost_delay | 0
vacuum_cost_limit | 200
vacuum_cost_page_dirty | 20
vacuum_cost_page_hit | 1
vacuum_cost_page_miss | 10
vacuum_defer_cleanup_age | 0
vacuum_freeze_min_age | 50000000
vacuum_freeze_table_age | 150000000
John Gorman |
Manager of Production Support, Architecture, Release Engineering | Eldorado |
a Division of MPHASIS |
www.eldoinc.com/ |
5353 North 16th Street, Suite 400, Phoenix, Arizona 85016-3228 |
Tel 602.604.3100 | Fax: 602.604.3115