27.02.2015 14:42, Sergey Shchukin
пишет:
27.02.2015 14:11, Sergey Shchukin
пишет:
27.02.2015 11:52, Jim Nasby пишет:
On 2/26/15 12:25 AM, Sergey Shchukin wrote:
Hi Radovan !
Thank you for the reply. The question is that this table is
not a
subject for a massive updates/deletes.
Is there any additional traces except from perf or pg_top to
trace what
replica is doing at the particular moment when we are
lagging in replay?
To see locks or spins or sleeps etc..
Please don't top-post.
What version is this? What is max_standby_streaming_delay set
to?
Thank you!
-
Best regards,
Sergey Shchukin
24.02.2015 19:05, Radovan Jablonovsky пишет:
This looks like more issue for
pgsql-general mailing list.
Possible solutions
1) Set specific autovacuum parameters on the big table.
The autovacuum
could vacuum table on multiple runs based on the
thresholds and cost
settings
Example of setting specific values of autovacuum and
analyze for
table. It should be adjusted for your system, work load,
table usage, etc:
alter table "my_schema"."my_big_table" set (fillfactor =
80,
autovacuum_enabled = true, autovacuum_vacuum_threshold =
200,
autovacuum_analyze_threshold = 400,
autovacuum_vacuum_scale_factor =
0.05, autovacuum_analyze_scale_factor = 0.005,
autovacuum_vacuum_cost_delay = 10,
autovacuum_vacuum_cost_limit = 5000);
2) Could be to partition the large table on master site
and vacuum it
partition by partition.
On Tue, Feb 24, 2015 at 6:42 AM, Sergey Shchukin
<shchukin.s.a@xxxxxxxxx
<mailto:shchukin.s.a@xxxxxxxxx>>
wrote:
Hi all!
May someone help me with the issue in the apply
process on the
replica. We have a stream replication and after vacuum
stops
working with a big table we get a "freeze" in applying
data on the
replica database. It looks like this:
Tue Feb 24 15:04:51 MSK 2015 Stream:
MASTER-masterdb:79607136410456 SLAVE:79607136410456
Replay:79607136339456 :: REPLAY 69 KBytes
(00:00:00.294485 seconds)
Tue Feb 24 15:04:52 MSK 2015 Stream:
MASTER-masterdb:79607137892672 SLAVE:79607137715392
Replay:79607137715392 :: REPLAY 173 KBytes
(00:00:00.142605 seconds)
Tue Feb 24 15:04:53 MSK 2015 Stream:
MASTER-masterdb:79607139327776 SLAVE:79607139241816
Replay:79607139241816 :: REPLAY 84 KBytes
(00:00:00.05223 seconds)
Tue Feb 24 15:04:54 MSK 2015 Stream:
MASTER-masterdb:79607141134776 SLAVE:79607141073344
Replay:79607141080032 :: REPLAY 54 KBytes
(00:00:00.010603 seconds)
Tue Feb 24 15:04:55 MSK 2015 Stream:
MASTER-masterdb:79607143085176 SLAVE:79607143026440
Replay:79607143038040 :: REPLAY 46 KBytes
(00:00:00.009506 seconds)
Tue Feb 24 15:04:56 MSK 2015 Stream:
MASTER-masterdb:79607145111280 SLAVE:79607145021384
Replay:79607145025664 :: REPLAY 83 KBytes
(00:00:00.006795 seconds)
Tue Feb 24 15:04:57 MSK 2015 Stream:
MASTER-masterdb:79607146564424 SLAVE:79607146478336
Replay:79607146501264 :: REPLAY 61 KBytes
(00:00:00.00701 seconds)
Tue Feb 24 15:04:58 MSK 2015 Stream:
MASTER-masterdb:79607148160680 SLAVE:79607148108352
Replay:79607147369320 :: REPLAY 773 KBytes
(00:00:00.449702 seconds)
Tue Feb 24 15:04:59 MSK 2015 Stream:
MASTER-masterdb:79607150220688 SLAVE:79607150159632
Replay:79607150171312 :: REPLAY 48 KBytes
(00:00:00.006594 seconds)
Tue Feb 24 15:05:00 MSK 2015 Stream:
MASTER-masterdb:79607152365360 SLAVE:79607152262696
Replay:79607152285240 :: REPLAY 78 KBytes
(00:00:00.007042 seconds)
Tue Feb 24 15:05:02 MSK 2015 Stream:
MASTER-masterdb:79607154049848 SLAVE:79607154012624
Replay:79607153446800 :: REPLAY 589 KBytes
(00:00:00.513637 seconds)
Tue Feb 24 15:05:03 MSK 2015 Stream:
MASTER-masterdb:79607155229992 SLAVE:79607155187864
Replay:79607155188312 :: REPLAY 41 KBytes
(00:00:00.004773 seconds)
Tue Feb 24 15:05:04 MSK 2015 Stream:
MASTER-masterdb:79607156833968 SLAVE:79607156764128
Replay:79607156785488 :: REPLAY 47 KBytes
(00:00:00.006846 seconds)
Tue Feb 24 15:05:05 MSK 2015 Stream:
MASTER-masterdb:79607158419848 SLAVE:79607158344856
Replay:79607158396352 :: REPLAY 23 KBytes
(00:00:00.005228 seconds)
Tue Feb 24 15:05:06 MSK 2015 Stream:
MASTER-masterdb:79607160004776 SLAVE:79607159962400
Replay:79607159988888 :: REPLAY 16 KBytes
(00:00:00.003162 seconds)
*--here apply process just stops*
Tue Feb 24 15:05:07 MSK 2015 Stream:
MASTER-masterdb:79607161592048 SLAVE:79607161550576
Replay:79607160986064 :: REPLAY 592 KBytes
(00:00:00.398376 seconds)
Tue Feb 24 15:05:08 MSK 2015 Stream:
MASTER-masterdb:79607163272840 SLAVE:79607163231384
Replay:79607160986064 :: REPLAY 2233 KBytes
(00:00:01.446759 seconds)
Tue Feb 24 15:05:09 MSK 2015 Stream:
MASTER-masterdb:79607164958632 SLAVE:79607164904448
Replay:79607160986064 :: REPLAY 3879 KBytes
(00:00:02.497181 seconds)
Tue Feb 24 15:05:10 MSK 2015 Stream:
MASTER-masterdb:79607166819560 SLAVE:79607166777712
Replay:79607160986064 :: REPLAY 5697 KBytes
(00:00:03.543107 seconds)
Tue Feb 24 15:05:11 MSK 2015 Stream:
MASTER-masterdb:79607168595280 SLAVE:79607168566536
Replay:79607160986064 :: REPLAY 7431 KBytes
(00:00:04.589736 seconds)
Tue Feb 24 15:05:12 MSK 2015 Stream:
MASTER-masterdb:79607170372064 SLAVE:79607170252480
Replay:79607160986064 :: REPLAY 9166 KBytes
(00:00:05.635918 seconds)
Tue Feb 24 15:05:13 MSK 2015 Stream:
MASTER-masterdb:79607171829480 SLAVE:79607171714144
Replay:79607160986064 :: REPLAY 10589 KBytes
(00:00:06.688115 seconds)
Tue Feb 24 15:05:14 MSK 2015 Stream:
MASTER-masterdb:79607173152488 SLAVE:79607173152488
Replay:79607160986064 :: REPLAY 11881 KBytes
(00:00:07.736993 seconds)
Tue Feb 24 15:05:15 MSK 2015 Stream:
MASTER-masterdb:79607174149968 SLAVE:79607174149968
Replay:79607160986064 :: REPLAY 12855 KBytes
(00:00:08.78538 seconds)
Tue Feb 24 15:05:16 MSK 2015 Stream:
MASTER-masterdb:79607176448344 SLAVE:79607176252088
Replay:79607160986064 :: REPLAY 15100 KBytes
(00:00:09.835184 seconds)
Tue Feb 24 15:05:17 MSK 2015 Stream:
MASTER-masterdb:79607177632216 SLAVE:79607177608224
Replay:79607160986064 :: REPLAY 16256 KBytes
(00:00:10.926493 seconds)
Tue Feb 24 15:05:18 MSK 2015 Stream:
MASTER-masterdb:79607179432960 SLAVE:79607179378096
Replay:79607160986064 :: REPLAY 18015 KBytes
(00:00:11.97989 seconds)
Tue Feb 24 15:05:19 MSK 2015 Stream:
MASTER-masterdb:79607180893384 SLAVE:79607180874256
Replay:79607160986064 :: REPLAY 19441 KBytes
(00:00:13.028921 seconds)
Tue Feb 24 15:05:20 MSK 2015 Stream:
MASTER-masterdb:79607182596224 SLAVE:79607182552272
Replay:79607160986064 :: REPLAY 21104 KBytes
(00:00:14.079497 seconds)
Tue Feb 24 15:05:21 MSK 2015 Stream:
MASTER-masterdb:79607183935312 SLAVE:79607183902592
Replay:79607160986064 :: REPLAY 22411 KBytes
(00:00:15.127679 seconds)
Tue Feb 24 15:05:23 MSK 2015 Stream:
MASTER-masterdb:79607185165880 SLAVE:79607185094032
Replay:79607160986064 :: REPLAY 23613 KBytes
(00:00:16.175132 seconds)
Tue Feb 24 15:05:24 MSK 2015 Stream:
MASTER-masterdb:79607187196920 SLAVE:79607187169368
Replay:79607160986064 :: REPLAY 25596 KBytes
(00:00:17.221981 seconds)
Tue Feb 24 15:05:25 MSK 2015 Stream:
MASTER-masterdb:79607188943856 SLAVE:79607188885952
Replay:79607160986064 :: REPLAY 27302 KBytes
(00:00:18.274362 seconds)
Tue Feb 24 15:05:26 MSK 2015 Stream:
MASTER-masterdb:79607190489400 SLAVE:79607190443160
Replay:79607160986064 :: REPLAY 28812 KBytes
(00:00:19.319987 seconds)
Tue Feb 24 15:05:27 MSK 2015 Stream:
MASTER-masterdb:79607192089312 SLAVE:79607192054048
Replay:79607160986064 :: REPLAY 30374 KBytes
(00:00:20.372305 seconds)
Tue Feb 24 15:05:28 MSK 2015 Stream:
MASTER-masterdb:79607193736800 SLAVE:79607193690056
Replay:79607160986064 :: REPLAY 31983 KBytes
(00:00:21.421359 seconds)
Tue Feb 24 15:05:29 MSK 2015 Stream:
MASTER-masterdb:79607195968648 SLAVE:79607195901296
Replay:79607160986064 :: REPLAY 34163 KBytes
(00:00:22.471334 seconds)
Tue Feb 24 15:05:30 MSK 2015 Stream:
MASTER-masterdb:79607197808840 SLAVE:79607197737720
Replay:79607160986064 :: REPLAY 35960 KBytes
(00:00:23.52269 seconds)
Tue Feb 24 15:05:31 MSK 2015 Stream:
MASTER-masterdb:79607199571144 SLAVE:79607199495976
Replay:79607160986064 :: REPLAY 37681 KBytes
(00:00:24.577615 seconds)
Tue Feb 24 15:05:32 MSK 2015 Stream:
MASTER-masterdb:79607201206104 SLAVE:79607201100392
Replay:79607160986064 :: REPLAY 39277 KBytes
(00:00:25.624604 seconds)
Tue Feb 24 15:05:33 MSK 2015 Stream:
MASTER-masterdb:79607203174208 SLAVE:79607203111136
Replay:79607160986064 :: REPLAY 41199 KBytes
(00:00:26.67059 seconds)
Tue Feb 24 15:05:34 MSK 2015 Stream:
MASTER-masterdb:79607204792888 SLAVE:79607204741600
Replay:79607160986064 :: REPLAY 42780 KBytes
(00:00:27.719088 seconds)
Tue Feb 24 15:05:35 MSK 2015 Stream:
MASTER-masterdb:79607206453216 SLAVE:79607206409032
Replay:79607160986064 :: REPLAY 44401 KBytes
(00:00:28.766647 seconds)
Tue Feb 24 15:05:36 MSK 2015 Stream:
MASTER-masterdb:79607208225344 SLAVE:79607208142176
Replay:79607160986064 :: REPLAY 46132 KBytes
(00:00:29.811434 seconds)
perf shows the following functions on the top
+ 22.50% postmaster [kernel.kallsyms] [k]
copy_user_generic_string
+ 8.48% postmaster postgres [.]
hash_search_with_hash_value
after 10 minutes or so the apply process continue to
work
Tue Feb 24 15:13:25 MSK 2015 Stream:
MASTER-masterdb:79608758742560 SLAVE:79608758718008
Replay:79607160986064 :: REPLAY 1560309 KBytes
(00:08:19.009653
seconds)
Tue Feb 24 15:13:26 MSK 2015 Stream:
MASTER-masterdb:79608759203608 SLAVE:79608759189680
Replay:79607160986064 :: REPLAY 1560759 KBytes
(00:08:20.057877
seconds)
Tue Feb 24 15:13:27 MSK 2015 Stream:
MASTER-masterdb:79608759639680 SLAVE:79608759633224
Replay:79607160986064 :: REPLAY 1561185 KBytes
(00:08:21.104723
seconds)
Tue Feb 24 15:13:28 MSK 2015 Stream:
MASTER-masterdb:79608760271200 SLAVE:79608760264128
Replay:79607160986064 :: REPLAY 1561802 KBytes
(00:08:22.148546
seconds)
Tue Feb 24 15:13:30 MSK 2015 Stream:
MASTER-masterdb:79608760622920 SLAVE:79608760616656
Replay:79607160986064 :: REPLAY 1562145 KBytes
(00:08:23.196645
seconds)
Tue Feb 24 15:13:31 MSK 2015 Stream:
MASTER-masterdb:79608761122040 SLAVE:79608761084584
Replay:79607160986064 :: REPLAY 1562633 KBytes
(00:08:24.240653
seconds)
Tue Feb 24 15:13:32 MSK 2015 Stream:
MASTER-masterdb:79608761434200 SLAVE:79608761426080
Replay:79607160986064 :: REPLAY 1562938 KBytes
(00:08:25.289429
seconds)
Tue Feb 24 15:13:33 MSK 2015 Stream:
MASTER-masterdb:79608761931008 SLAVE:79608761904808
Replay:79607160986064 :: REPLAY 1563423 KBytes
(00:08:26.338498
seconds)
*--apply starts*
Tue Feb 24 15:13:34 MSK 2015 Stream:
MASTER-masterdb:79608762360568 SLAVE:79608762325712
Replay:79607163554680 :: REPLAY 1561334 KBytes
(00:08:25.702423
seconds)
Tue Feb 24 15:13:35 MSK 2015 Stream:
MASTER-masterdb:79608762891224 SLAVE:79608762885928
Replay:79607166466488 :: REPLAY 1559008 KBytes
(00:08:25.011046
seconds)
Tue Feb 24 15:13:36 MSK 2015 Stream:
MASTER-masterdb:79608763681920 SLAVE:79608763667256
Replay:79607167054056 :: REPLAY 1559207 KBytes
(00:08:25.827531
seconds)
Tue Feb 24 15:13:37 MSK 2015 Stream:
MASTER-masterdb:79608764207088 SLAVE:79608764197744
Replay:79607175610296 :: REPLAY 1551364 KBytes
(00:08:21.182428
seconds)
Tue Feb 24 15:13:38 MSK 2015 Stream:
MASTER-masterdb:79608764857920 SLAVE:79608764832432
Replay:79607183599632 :: REPLAY 1544197 KBytes
(00:08:16.742467
seconds)
Tue Feb 24 15:13:39 MSK 2015 Stream:
MASTER-masterdb:79608765323360 SLAVE:79608765281408
Replay:79607186862176 :: REPLAY 1541466 KBytes
(00:08:15.569874
seconds)
Tue Feb 24 15:13:40 MSK 2015 Stream:
MASTER-masterdb:79608765848240 SLAVE:79608765824520
Replay:79607186862176 :: REPLAY 1541978 KBytes
(00:08:16.620932
seconds)
All this is a result of completion of "vacuum verbose
analyze
master_table" on the master site
Any help would be appreciated
--
Best regards,
Sergey Shchukin
--
*Radovan Jablonovsky* | SaaS DBA | Phone 1-403-262-6519
(ext. 7256) |
Fax 1-403-233-8046
Hi Jim,
The version is PostgreSQL 9.3.6 on x86_64 RHEL 6.6
show
max_standby_streaming_delay;
max_standby_streaming_delay
-----------------------------
30s
Again, after the vacuum finished
on
my table I got locks in apply process on replica - see
lag_byte
masterdb01d/masterdb M # vacuum verbose rtable.rtable_uidl;
INFO: 00000: vacuuming "rtable.rtable_uidl"
LOCATION: lazy_scan_heap, vacuumlazy.c:438
INFO: 00000: scanned index "pk_rtable_uidl" to remove 6 row
versions
DETAIL: CPU 240.80s/183.19u sec elapsed 703.85 sec.
LOCATION: lazy_vacuum_index, vacuumlazy.c:1335
INFO: 00000: "rtable_uidl": removed 6 row versions in 6
pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec.
LOCATION: lazy_vacuum_heap, vacuumlazy.c:1169
INFO: 00000: index "pk_rtable_uidl" now contains 3763411079
row versions in 32755911 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
LOCATION: lazy_cleanup_index, vacuumlazy.c:1387
INFO: 00000: "rtable_uidl": found 6 removable, 1426488
nonremovable row versions in 12734 out of 26047416 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 3 unused item pointers.
0 pages are entirely empty.
CPU 241.37s/184.04u sec elapsed 705.45 sec.
LOCATION: lazy_scan_heap, vacuumlazy.c:1101
VACUUM
Time: 705685.954 ms
masterdb01d/postgres M # \g
-[ RECORD 1 ]----+------------------------------
procpid | 21487
usesysid | 16413
usename | repl
application_name | walreceiver
client_addr |
client_hostname | masterdb01e
client_port | 35261
backend_start | 2015-02-27 13:02:27.203938+03
state | streaming
sent_location | 494B/CB30B530
write_location | 494B/CB30B530
flush_location | 494B/CB30B530
replay_location | 494B/A02B9070 <<< stopped
here!1
sync_priority | 0
sync_state | async
total_lag_byte | 721757376
total_lag_nice | 688 MB
lag_byte | 721757376
lag_byte_nice | 688 MB
--
Best regards,
Sergey Shchukin
Hi All!
The issue is repeating still...
Some updates:
The size of the rtable_uidl is about 3 773 185 761 rows, quite a big
Before vacuum
schema_size
| schema_size_with_indexes
-------------+--------------------------
199 GB | 450 GB
rtabledb01e/rtabledb
R # select * from pg_stat_all_tables where schemaname =
'rtable' and relname ='rtable_uidl';
-[ RECORD 1 ]-----+-----------
relid | 16511
schemaname | rtable
relname | rtable_uidl
seq_scan | 2
seq_tup_read | 7546016953
idx_scan | 3008619
idx_tup_fetch | 44139478
n_tup_ins | 0
n_tup_upd | 0
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 0
n_dead_tup | 0
last_vacuum | [null]
last_autovacuum | [null]
last_analyze | [null]
last_autoanalyze | [null]
vacuum_count | 0
autovacuum_count | 0
analyze_count | 0
autoanalyze_count | 0
Server spec: 2 Xeon E5-2660, 128GB RAM, disk subsystem
/dev/md2 /var/lib/pgsql/9.3/data
/dev/md3 /var/lib/pgsql/9.3/data/pg_xlog
raids are raid10 on top of 8x300GB intel SSD disks. Both raids use
the same disks which are split into two parts (ex. md2:sdb1..,
md3:sdb2...).
It should be mentioned that we see significant disc activity on
replica DBS during the lockup of apply process (see disks_load.png)
OS: RHEL 6.6
Non default parameters - please check the attachment
After the vacuum
rtabledb01d/rtabledb
M # vacuum verbose rtable.rtable_uidl;
INFO: 00000: vacuuming "rtable.rtable_uidl"
LOCATION: lazy_scan_heap, vacuumlazy.c:438
INFO: 00000: scanned index "pk_rtable_uidl" to remove 3 row
versions
DETAIL: CPU 251.85s/165.91u sec elapsed 743.44 sec.
LOCATION: lazy_vacuum_index, vacuumlazy.c:1335
INFO: 00000: "rtable_uidl": removed 3 row versions in 3 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec.
LOCATION: lazy_vacuum_heap, vacuumlazy.c:1169
INFO: 00000: index "pk_rtable_uidl" now contains 3773254753
row versions in 32846328 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
LOCATION: lazy_cleanup_index, vacuumlazy.c:1387
INFO: 00000: "rtable_uidl": found 3 removable, 2589545
nonremovable row versions in 20398 out of 26102225 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 253.20s/167.60u sec elapsed 749.44 sec.
LOCATION: lazy_scan_heap, vacuumlazy.c:1101
VACUUM
Time: 750141.401 ms
date
------------
2015-03-03
(1 row)
timetz
--------------------
16:35:50.245227+03
(1 row)
Replay stopped on replicas
Tue Mar 3
16:35:12 MSK 2015 Stream: MASTER-rtabledb01d:81829147840528
SLAVE:81829147765224 Replay:81827156622184 :: REPLAY
1944550 KBytes (00:07:49.816045 seconds)
Tue Mar 3 16:35:17 MSK 2015 Stream:
MASTER-rtabledb01d:81829156031576 SLAVE:81829155989056 Replay:81827156622184
:: REPLAY 1952549 KBytes (00:07:54.87067 seconds)
Tue Mar 3 16:35:22 MSK 2015 Stream:
MASTER-rtabledb01d:81829165013208 SLAVE:81829164949312 Replay:81827156622184
:: REPLAY 1961320 KBytes (00:07:59.927685 seconds)
Tue Mar 3 16:35:27 MSK 2015 Stream:
MASTER-rtabledb01d:81829172941288 SLAVE:81829172887000 Replay:81827156622184
:: REPLAY 1969062 KBytes (00:08:04.977663 seconds)
Tue Mar 3 16:35:32 MSK 2015 Stream:
MASTER-rtabledb01d:81829181088512 SLAVE:81829181063304 Replay:81827156622184
:: REPLAY 1977018 KBytes (00:08:10.033499 seconds)
Tue Mar 3 16:35:37 MSK 2015 Stream:
MASTER-rtabledb01d:81829191442216 SLAVE:81829191364192 Replay:81827156622184
:: REPLAY 1987129 KBytes (00:08:15.085862 seconds)
Tue Mar 3 16:35:42 MSK 2015 Stream:
MASTER-rtabledb01d:81829204580736 SLAVE:81829204444136 Replay:81827156622184
:: REPLAY 1999960 KBytes (00:08:20.13815 seconds)
Tue Mar 3 16:35:47 MSK 2015 Stream:
MASTER-rtabledb01d:81829218243240 SLAVE:81829218115272 Replay:81827156622184
:: REPLAY 2013302 KBytes (00:08:25.190515 seconds)
After the vacuum
schema_size
| schema_size_with_indexes
-------------+--------------------------
199 GB | 450 GB
rtabledb01d/rtabledb M # select * from pg_stat_all_tables
where schemaname = 'rtable' and relname ='rtable_uidl';
-[ RECORD 1 ]-----+------------------------------
relid | 16511
schemaname | rtable
relname | rtable_uidl
seq_scan | 1
seq_tup_read | 14223712
idx_scan | 1370133
idx_tup_fetch | 28596708
n_tup_ins | 12473820
n_tup_upd | 0
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 3771361529
n_dead_tup | 0
last_vacuum | 2015-03-03 16:27:23.231513+03
last_autovacuum | 2015-03-03 07:11:50.891016+03
last_analyze | 2015-02-26 19:50:28.097629+03
last_autoanalyze | 2015-03-03 14:09:14.463953+03
vacuum_count | 4
autovacuum_count | 5
analyze_count | 1
autoanalyze_count | 3
Thanks in advance!
Best regards,
Sergey Shchukin
|