17.03.2015 13:22, Sergey Shchukin
пишет:
05.03.2015 11:25, Jim Nasby пишет:
On
2/27/15 5:11 AM, Sergey Shchukin wrote:
show max_standby_streaming_delay;
max_standby_streaming_delay
-----------------------------
30s
We both need to be more clear about which server we're talking
about (master or replica).
What are max_standby_streaming_delay and
max_standby_archive_delay set to *on the replica*?
My hope is that one or both of those is set to somewhere around
8 minutes on the replica. That would explain everything.
If that's not the case then I suspect what's happening is
there's something running on the replica that isn't checking for
interrupts frequently enough. That would also explain it.
When replication hangs, is the replication process using a lot
of CPU? Or is it just sitting there? What's the process status
for the replay process show?
Can you get a trace of the replay process on the replica when
this is happening to see where it's spending all it's time?
How are you generating these log lines?
Tue Feb 24 15:05:07 MSK 2015 Stream:
MASTER-masterdb:79607161592048 SLAVE:79607161550576
Replay:79607160986064 :: REPLAY 592 KBytes (00:00:00.398376
seconds)
Do you see the confl_* fields in pg_stat_database_conflicts on
the *replica* increasing?
Hi Jim,
max_standby_streaming_delay and max_standby_archive_delay both
are 30s on master and replica dbs
I don't see any specific or heavy workload during this issue with
a hanging apply process. Just a normal queries as usual.
But I see an increased disk activity during the time when the
apply issue is ongoing
DSK
| sdc | | busy 61% |
read 11511 | | write 4534 | KiB/r 46
| | KiB/w 4 | MBr/s 52.78
| | MBw/s 1.88 | avq 1.45
| | avio 0.38 ms |
DSK | sde | | busy 60% |
read 11457 | | write 4398 | KiB/r 46
| | KiB/w 4 | MBr/s 51.97
| | MBw/s 1.83 | avq 1.47
| | avio 0.38 ms |
DSK | sdd | | busy 60% |
read 9673 | | write 4538 | KiB/r 61
| | KiB/w 4 | MBr/s 58.24
| | MBw/s 1.88 | avq 1.47
| | avio 0.42 ms |
DSK | sdj | | busy 59% |
read 9576 | | write 4177 | KiB/r 63
| | KiB/w 4 | MBr/s 59.30
| | MBw/s 1.75 | avq 1.48
| | avio 0.43 ms |
DSK | sdh | | busy 59% |
read 9615 | | write 4305 | KiB/r 63
| | KiB/w 4 | MBr/s 59.23
| | MBw/s 1.80 | avq 1.48
| | avio 0.42 ms |
DSK | sdf | | busy 59% |
read 9483 | | write 4404 | KiB/r 63
| | KiB/w 4 | MBr/s 59.11
| | MBw/s 1.83 | avq 1.47
| | avio 0.42 ms |
DSK | sdi | | busy 59% |
read 11273 | | write 4173 | KiB/r 46
| | KiB/w 4 | MBr/s 51.50
| | MBw/s 1.75 | avq 1.43
| | avio 0.38 ms |
DSK | sdg | | busy 59% |
read 11406 | | write 4297 | KiB/r 46
| | KiB/w 4 | MBr/s 51.66
| | MBw/s 1.80 | avq 1.46
| | avio 0.37 ms |
Although it's not seems to be an upper IO limit.
Normally disks are busy at 20-45%
DSK
| sde | | busy 29% | read
6524 | | write 14426 | KiB/r 26
| | KiB/w 5 | MBr/s 17.08
| | MBw/s 7.78 | avq 10.46
| | avio 0.14 ms |
DSK | sdi | | busy 29% | read
6590 | | write 14391 | KiB/r 26
| | KiB/w 5 | MBr/s 17.19
| | MBw/s 7.76 | avq 8.75
| | avio 0.14 ms |
DSK | sdg | | busy 29% | read
6547 | | write 14401 | KiB/r 26
| | KiB/w 5 | MBr/s 16.94
| | MBw/s 7.60 | avq 7.28
| | avio 0.14 ms |
DSK | sdc | | busy 29% | read
6835 | | write 14283 | KiB/r 27
| | KiB/w 5 | MBr/s 18.08
| | MBw/s 7.74 | avq 8.77
| | avio 0.14 ms |
DSK | sdf | | busy 23% | read
3808 | | write 14391 | KiB/r 36
| | KiB/w 5 | MBr/s 13.49
| | MBw/s 7.78 | avq 12.88
| | avio 0.13 ms |
DSK | sdd | | busy 23% | read
3747 | | write 14229 | KiB/r 33
| | KiB/w 5 | MBr/s 12.32
| | MBw/s 7.74 | avq 10.07
| | avio 0.13 ms |
DSK | sdj | | busy 23% | read
3737 | | write 14336 | KiB/r 36
| | KiB/w 5 | MBr/s 13.16
| | MBw/s 7.76 | avq 10.48
| | avio 0.13 ms |
DSK | sdh | | busy 23% | read
3793 | | write 14362 | KiB/r 35
| | KiB/w 5 | MBr/s 13.29
| | MBw/s 7.60 | avq 8.61
| | avio 0.13 ms |
Also during the issue perf shows [k] copy_user_generic_string on
the top positions
14.09%
postmaster postgres [.] 0x00000000001b4569
10.25% postmaster [kernel.kallsyms] [k]
copy_user_generic_string
4.15% postmaster postgres [.]
hash_search_with_hash_value
2.08% postmaster postgres [.]
SearchCatCache
1.79% postmaster postgres [.]
LWLockAcquire
1.18% postmaster libc-2.12.so [.] memcpy
1.12% postmaster postgres [.] mdnblocks
Issue starts: at 19:43
Mon Mar
16 19:43:04 MSK 2015 Stream: MASTER-rdb04d:70837172337784
SLAVE:70837172314864 Replay:70837172316512 :: REPLAY 21
KBytes (00:00:00.006225 seconds)
Mon Mar 16 19:43:09 MSK 2015 Stream:
MASTER-rdb04d:70837177455624 SLAVE:70837177390968
Replay:70837176794376 :: REPLAY 646 KBytes (00:00:00.367305
seconds)
Mon Mar 16 19:43:14 MSK 2015 Stream:
MASTER-rdb04d:70837185005120 SLAVE:70837184961280
Replay:70837183253896 :: REPLAY 1710 KBytes
(00:00:00.827881 seconds)
Mon Mar 16 19:43:19 MSK 2015 Stream:
MASTER-rdb04d:70837190417984 SLAVE:70837190230232
Replay:70837183253896 :: REPLAY 6996 KBytes (00:00:05.873169
seconds)
Mon Mar 16 19:43:24 MSK 2015 Stream:
MASTER-rdb04d:70837198538232 SLAVE:70837198485000
Replay:70837183253896 :: REPLAY 14926 KBytes
(00:00:11.025561 seconds)
Mon Mar 16 19:43:29 MSK 2015 Stream:
MASTER-rdb04d:70837209961192 SLAVE:70837209869384
Replay:70837183253896 :: REPLAY 26081 KBytes
(00:00:16.068014 seconds)
We see [k]
copy_user_generic_string
12.90% postmaster
[kernel.kallsyms] [k] copy_user_generic_string
11.49% postmaster postgres [.]
0x00000000001f40c1
4.74% postmaster postgres [.]
hash_search_with_hash_value
1.86% postmaster postgres [.] mdnblocks
1.73% postmaster postgres [.]
LWLockAcquire
1.67% postmaster postgres [.]
SearchCatCache
25.71% postmaster [kernel.kallsyms] [k]
copy_user_generic_string
7.89% postmaster postgres [.]
hash_search_with_hash_value
4.66% postmaster postgres [.]
0x00000000002108da
4.51% postmaster postgres [.] mdnblocks
3.36% postmaster [kernel.kallsyms] [k] put_page
Issue stops: at 19:51:39
Mon Mar
16 19:51:24 MSK 2015 Stream: MASTER-rdb04d:70838904179344
SLAVE:70838903934392 Replay:70837183253896 :: REPLAY 1680591
KBytes (00:08:10.384679 seconds)
Mon Mar 16 19:51:29 MSK 2015 Stream:
MASTER-rdb04d:70838929994336 SLAVE:70838929873624
Replay:70837183253896 :: REPLAY 1705801 KBytes
(00:08:15.428773 seconds)
Mon Mar 16 19:51:34 MSK 2015 Stream:
MASTER-rdb04d:70838951993624 SLAVE:70838951899768
Replay:70837183253896 :: REPLAY 1727285 KBytes
(00:08:20.472567 seconds)
Mon Mar 16 19:51:39 MSK 2015 Stream:
MASTER-rdb04d:70838975297912 SLAVE:70838975180384
Replay:70837208050872 :: REPLAY 1725827 KBytes
(00:08:10.256935 seconds)
Mon Mar 16 19:51:44 MSK 2015 Stream:
MASTER-rdb04d:70839001502160 SLAVE:70839001412616
Replay:70837260116984 :: REPLAY 1700572 KBytes
(00:07:49.849511 seconds)
Mon Mar 16 19:51:49 MSK 2015 Stream:
MASTER-rdb04d:70839022866760 SLAVE:70839022751184
Replay:70837276732880 :: REPLAY 1705209 KBytes
(00:07:42.307364 seconds)
And copy_user_generic_string
goes down
+
13.43% postmaster postgres [.]
0x000000000023dc9a
+ 3.71% postmaster [kernel.kallsyms] [k]
copy_user_generic_string
+ 2.46% init [kernel.kallsyms] [k]
intel_idle
+ 2.30% postmaster postgres [.]
hash_search_with_hash_value
+ 2.01% postmaster postgres [.]
SearchCatCache
Could you clarify what types of traces did you mean? GDB?
To calculate slave and apply lag I use the following query at the
replica site
slave_lag=$($psql
-U monitor -h$s_host -p$s_port -A -t -c "SELECT
pg_xlog_location_diff(pg_last_xlog_receive_location(), '0/0')
AS receive" $p_db)
replay_lag=$($psql -U monitor -h$s_host -p$s_port -A -t -c
"SELECT pg_xlog_location_diff(pg_last_xlog_replay_location(),
'0/0') AS replay" $p_db)
replay_timediff=$($psql -U monitor -h$s_host -p$s_port -A -t
-c "SELECT NOW() - pg_last_xact_replay_timestamp() AS
replication_delay" $p_db)
master_lag=$($psql -U monitor -h$p_host -p$p_port -A -t -c
"SELECT pg_xlog_location_diff(pg_current_xlog_location(),
'0/0') AS offset" $p_db)
echo "$(date) Stream: MASTER-$p_host:$master_lag
SLAVE:$slave_lag Replay:$replay_lag :: REPLAY $(bc
<<< $master_lag/1024-$replay_lag/1024) KBytes
(${replay_timediff} seconds)"
-
Best regards,
Sergey Shchukin
One more thing
We have upgraded one of our shards to 9.4.1 and expectedly that did not help.
A few things to notice which may be useful.
1. When replay stops, startup process reads a lot from array with $PGDATA. In iotop and iostat we see the following:
Total DISK READ: 490.42 M/s | Total DISK WRITE: 3.82 M/s
TID PRIO USER DISK READ> DISK WRITE SWAPIN IO COMMAND
3316 be/4 postgres 492.34 M/s 0.00 B/s 0.00 % 39.91 % postgres: startup process
<...>
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
<...>
md2 0.00 0.00 6501.00 7.00 339.90 0.03 106.97 0.00 0.00 0.00 0.00
md3 0.00 0.00 0.00 1739.00 0.00 6.79 8.00 0.00 0.00 0.00 0.00
root@rpopdb04g ~ # fgrep 9.4 /proc/mounts
/dev/md2 /var/lib/pgsql/9.4/data ext4 rw,noatime,nodiratime,barrier=1,stripe=64,data="" 0 0
/dev/md3 /var/lib/pgsql/9.4/data/pg_xlog ext4 rw,noatime,nodiratime,barrier=0,stripe=64,data="" 0 0
root@rpopdb04g ~ #
2. The state of the startup process is changing in such a way:
root@rpopdb04g ~ # while true; do ps aux | grep '[s]tartup'; sleep 1; done
postgres 3316 26.6 3.2 4732052 4299260 ? Rs 18:04 8:11 postgres: startup process
postgres 3316 26.6 3.2 4732052 4299260 ? Ts 18:04 8:11 postgres: startup process
postgres 3316 26.6 3.2 4732052 4299260 ? Rs 18:04 8:12 postgres: startup process
postgres 3316 26.6 3.2 4732052 4299260 ? Ds 18:04 8:12 postgres: startup process
postgres 3316 26.6 3.2 4732052 4299260 ? Rs 18:04 8:13 postgres: startup process
postgres 3316 26.6 3.2 4732052 4299260 ? Rs 18:04 8:13 postgres: startup process
postgres 3316 26.6 3.2 4732052 4299260 ? Rs 18:04 8:14 postgres: startup process
postgres 3316 26.6 3.2 4732052 4299260 ? Ts 18:04 8:14 postgres: startup process
postgres 3316 26.6 3.2 4732052 4299260 ? Ds 18:04 8:15 postgres: startup process
postgres 3316 26.6 3.2 4732052 4299260 ? Rs 18:04 8:15 postgres: startup process
postgres 3316 26.7 3.2 4732052 4299260 ? Ds 18:04 8:15 postgres: startup process
postgres 3316 26.7 3.2 4732052 4299260 ? Ds 18:04 8:16 postgres: startup process
postgres 3316 26.7 3.2 4732052 4299260 ? Rs 18:04 8:16 postgres: startup process
postgres 3316 26.7 3.2 4732052 4299260 ? Rs 18:04 8:17 postgres: startup process
postgres 3316 26.7 3.2 4732052 4299260 ? Ts 18:04 8:17 postgres: startup process
postgres 3316 26.7 3.2 4732052 4299260 ? Rs 18:04 8:18 postgres: startup process
postgres 3316 26.7 3.2 4732052 4299260 ? Rs 18:04 8:18 postgres: startup process
postgres 3316 26.7 3.2 4732052 4299260 ? Ds 18:04 8:19 postgres: startup process
postgres 3316 26.8 3.2 4732052 4299260 ? Rs 18:04 8:19 postgres: startup process
postgres 3316 26.8 3.2 4732052 4299260 ? Rs 18:04 8:20 postgres: startup process
postgres 3316 26.8 3.2 4732052 4299260 ? Rs 18:04 8:20 postgres: startup process
postgres 3316 26.8 3.2 4732052 4299260 ? Ds 18:04 8:21 postgres: startup process
postgres 3316 26.8 3.2 4732052 4299260 ? Ds 18:04 8:22 postgres: startup process
postgres 3316 26.8 3.2 4732052 4299260 ? Rs 18:04 8:22 postgres: startup process
postgres 3316 26.8 3.2 4732052 4299260 ? Rs 18:04 8:23 postgres: startup process
postgres 3316 26.9 3.2 4732052 4299260 ? Rs 18:04 8:23 postgres: startup process
postgres 3316 26.9 3.2 4732052 4299260 ? Rs 18:04 8:24 postgres: startup process
postgres 3316 26.9 3.2 4732052 4299260 ? Ts 18:04 8:24 postgres: startup process
postgres 3316 26.9 3.2 4732052 4299260 ? Rs 18:04 8:25 postgres: startup process
^C
root@rpopdb04g ~ #
3. confl* fields in pg_stat_database_conflicts are always zero during the pausing of replay.
4. The stack-traces taken with GDB are not really informative. We will recompile PostgreSQL with —enable-debug option and run it on one of our replicas if needed. Since it is a production system we would like to do it last of all. But we will do it if anybody would not give us any ideas.
We did it. Most of the backtraces (taken while replay_location was not changing) looks like that:
[Thread debugging using libthread_db enabled] 0x00007f54a71444c0 in __read_nocancel () from /lib64/libc.so.6 #0 0x00007f54a71444c0 in __read_nocancel () from /lib64/libc.so.6 #1 0x000000000065d2f5 in FileRead (file=<value optimized out>, buffer=0x7f5470e6ba20 "\004", amount=8192) at fd.c:1286 #2 0x000000000067acad in mdread (reln=<value optimized out>, forknum=<value optimized out>, blocknum=311995, buffer=0x7f5470e6ba20 "\004") at md.c:679 #3 0x0000000000659b4e in ReadBuffer_common (smgr=<value optimized out>, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=311995, mode=RBM_NORMAL_NO_LOG, strategy=0x0, hit=0x7fff898a912f "") at bufmgr.c:476 #4 0x000000000065a61b in ReadBufferWithoutRelcache (rnode=..., forkNum=MAIN_FORKNUM, blockNum=311995, mode=<value optimized out>, strategy=<value optimized out>) at bufmgr.c:287 #5 0x00000000004cfb78 in XLogReadBufferExtended (rnode=..., forknum=MAIN_FORKNUM, blkno=311995, mode=RBM_NORMAL_NO_LOG) at xlogutils.c:324 #6 0x00000000004a3651 in btree_xlog_vacuum (lsn=71742288638464, record=0x1e48b78) at nbtxlog.c:522 #7 btree_redo (lsn=71742288638464, record=0x1e48b78) at nbtxlog.c:1144 #8 0x00000000004c903a in StartupXLOG () at xlog.c:6827 #9 0x000000000062f8bf in StartupProcessMain () at startup.c:224 #10 0x00000000004d3e9a in AuxiliaryProcessMain (argc=2, argv=0x7fff898a98a0) at bootstrap.c:416 #11 0x000000000062a99c in StartChildProcess (type=StartupProcess) at postmaster.c:5146 #12 0x000000000062e9e2 in PostmasterMain (argc=3, argv=<value optimized out>) at postmaster.c:1237 #13 0x00000000005c7d68 in main (argc=3, argv=0x1e22910) at main.c:228
So the questions are: 1. Is there anything we can tune right now? Except for not reading from replicas and partitioning this table. 2. Isn’t there still a function to determine that a buffer is not pinned in shared_buffers without reading it from disk? To optimize current behaviour in the future.
5. In one of the experiments replay stopped on 4115/56126DC0 xlog position. Here is a bit of pg_xlogdump output:
rpopdb04d/rpopdb M # select pg_xlogfile_name('4115/56126DC0');
pg_xlogfile_name
--------------------------
000000060000411500000056
(1 row)
Time: 0.496 ms
rpopdb04d/rpopdb M #
root@pg-backup04h /u0/rpopdb04/wals/0000000600004115 # /usr/pgsql-9.4/bin/pg_xlogdump 000000060000411500000056 000000060000411500000056 | fgrep 4115/56126DC0 -C10
rmgr: Heap len (rec/tot): 36/ 3948, tx: 3930143874, lsn: 4115/561226C8, prev 4115/56122698, bkp: 1000, desc: hot_update: rel 1663/16420/16737; tid 196267/4 xmax 3930143874 ; new tid 196267/10 xmax 0
rmgr: Transaction len (rec/tot): 12/ 44, tx: 3930143874, lsn: 4115/56123638, prev 4115/561226C8, bkp: 0000, desc: commit: 2015-03-19 18:26:27.725158 MSK
rmgr: Heap len (rec/tot): 58/ 90, tx: 3930143875, lsn: 4115/56123668, prev 4115/56123638, bkp: 0000, desc: hot_update: rel 1663/16420/16737; tid 23624/3 xmax 3930143875 ; new tid 23624/21 xmax 0
rmgr: Transaction len (rec/tot): 12/ 44, tx: 3930143875, lsn: 4115/561236C8, prev 4115/56123668, bkp: 0000, desc: commit: 2015-03-19 18:26:27.726432 MSK
rmgr: Heap len (rec/tot): 36/ 2196, tx: 3930143876, lsn: 4115/561236F8, prev 4115/561236C8, bkp: 1000, desc: hot_update: rel 1663/16420/16737; tid 123008/4 xmax 3930143876 ; new tid 123008/5 xmax 0
rmgr: Transaction len (rec/tot): 12/ 44, tx: 3930143876, lsn: 4115/56123F90, prev 4115/561236F8, bkp: 0000, desc: commit: 2015-03-19 18:26:27.727088 MSK
rmgr: Heap len (rec/tot): 36/ 7108, tx: 3930143877, lsn: 4115/56123FC0, prev 4115/56123F90, bkp: 1000, desc: hot_update: rel 1663/16420/16737; tid 34815/6 xmax 3930143877 ; new tid 34815/16 xmax 0
rmgr: Transaction len (rec/tot): 12/ 44, tx: 3930143877, lsn: 4115/56125BA0, prev 4115/56123FC0, bkp: 0000, desc: commit: 2015-03-19 18:26:27.728178 MSK
rmgr: Heap len (rec/tot): 36/ 4520, tx: 3930143878, lsn: 4115/56125BD0, prev 4115/56125BA0, bkp: 1000, desc: hot_update: rel 1663/16420/16737; tid 147863/5 xmax 3930143878 ; new tid 147863/16 xmax 0
rmgr: Transaction len (rec/tot): 12/ 44, tx: 3930143878, lsn: 4115/56126D90, prev 4115/56125BD0, bkp: 0000, desc: commit: 2015-03-19 18:26:27.728339 MSK
rmgr: Btree len (rec/tot): 20/ 52, tx: 0, lsn: 4115/56126DC0, prev 4115/56126D90, bkp: 0000, desc: vacuum: rel 1663/16420/16796; blk 31222118, lastBlockVacuumed 0
rmgr: Heap len (rec/tot): 36/ 6112, tx: 3930143879, lsn: 4115/56126DF8, prev 4115/56126DC0, bkp: 1000, desc: hot_update: rel 1663/16420/16737; tid 23461/26 xmax 3930143879 ; new tid 23461/22 xmax 0
rmgr: Heap2 len (rec/tot): 24/ 8160, tx: 0, lsn: 4115/561285F0, prev 4115/56126DF8, bkp: 1000, desc: clean: rel 1663/16420/16782; blk 21997709 remxid 0
rmgr: Transaction len (rec/tot): 12/ 44, tx: 3930143879, lsn: 4115/5612A5E8, prev 4115/561285F0, bkp: 0000, desc: commit: 2015-03-19 18:26:27.728805 MSK
rmgr: Heap2 len (rec/tot): 20/ 8268, tx: 0, lsn: 4115/5612A618, prev 4115/5612A5E8, bkp: 1000, desc: visible: rel 1663/16420/16782; blk 21997709
rmgr: Heap len (rec/tot): 36/ 7420, tx: 3930143880, lsn: 4115/5612C680, prev 4115/5612A618, bkp: 1000, desc: hot_update: rel 1663/16420/16737; tid 37456/8 xmax 3930143880 ; new tid 37456/29 xmax 0
rmgr: Transaction len (rec/tot): 12/ 44, tx: 3930143880, lsn: 4115/5612E398, prev 4115/5612C680, bkp: 0000, desc: commit: 2015-03-19 18:26:27.729141 MSK
rmgr: Heap len (rec/tot): 36/ 7272, tx: 3930143881, lsn: 4115/5612E3C8, prev 4115/5612E398, bkp: 1000, desc: hot_update: rel 1663/16420/16737; tid 23614/2 xmax 3930143881 ; new tid 23614/22 xmax 0
rmgr: Heap len (rec/tot): 150/ 182, tx: 0, lsn: 4115/56130048, prev 4115/5612E3C8, bkp: 0000, desc: inplace: rel 1663/16420/12764; tid 11/31
rmgr: Transaction len (rec/tot): 12/ 44, tx: 3930143881, lsn: 4115/56130100, prev 4115/56130048, bkp: 0000, desc: commit: 2015-03-19 18:26:27.729340 MSK
rmgr: Heap len (rec/tot): 43/ 75, tx: 3930143882, lsn: 4115/56130130, prev 4115/56130100, bkp: 0000, desc: insert: rel 1663/16420/16773; tid 10159950/26
rmgr: Btree len (rec/tot): 42/ 74, tx: 3930143882, lsn: 4115/56130180, prev 4115/56130130, bkp: 0000, desc: insert: rel 1663/16420/16800; tid 12758988/260
root@pg-backup04h /u0/rpopdb04/wals/0000000600004115 #
Any help would be really appropriate. Thanks in advance.
--
May the force be with you…
|