Hi > On 3 Sep 2020, at 14:39, Jehan-Guillaume de Rorthais <jgdr@xxxxxxxxxx> wrote: > On Mon, 24 Aug 2020 18:45:42 +0300 > Олег Самойлов <splarv@xxxxx> wrote: > >>> 21 авг. 2020 г., в 17:26, Jehan-Guillaume de Rorthais <jgdr@xxxxxxxxxx> >>> написал(а): >>> >>> On Thu, 20 Aug 2020 15:16:10 +0300 >>> Based on setup per node, you can probably add >>> 'synchronous_commit=remote_write' in the common conf. >> >> Nope. I set 'synchronous_commit=remote_write' only for 3 and 4 node clusters. >> [...] > > Then I suppose your previous message had an error as it shows three > nodes tuchanka3a, tuchanka3b and tuchanka3c (no 4th node), all with remote_write > in krogan3.conf. But anyway. I tested 4 different types of clusters. The cluster 1 and 2 has two nodes and thus don't reveal this bug. The cluster 3 and 4 has 3 and 4 nodes and thus this bug is observed. I used the cluster 3 as example. > >>>> [...] >>>> pacemaker config, specific for this cluster: >>>> [...] >>> >>> why did you add "monitor interval=15"? No harm, but it is redundant with >>> "monitor interval=16 role=Master" and "monitor interval=17 role=Slave". >> >> I can't remember clearly. :) Look what happens without it. >> >> + pcs -f configured_cib.xml resource create krogan2DB ocf:heartbeat:pgsqlms >> bindir=/usr/pgsql-11/bin pgdata=/var/lib/pgsql/krogan2 >> recovery_template=/var/lib/pgsql/krogan2.paf meta master notify=true >> resource-stickiness=10 >> Warning: changing a monitor operation interval from 15 to 16 to make the >> operation unique >> Warning: changing a monitor operation interval from 16 to 17 to make the >> operation unique > > Something fishy here. This command lack op monitor settings. Pacemaker don't > add any default monitor operation with default interval if you don't give one > at resource creation. > > If you create such a resource with no monitoring, the cluster will start/stop > it when needed, but will NOT check for its health. See: > > https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-resource-monitoring.html May be. But keep in mind, that I uses `pcs`, I do not edit the xml file directly. And I use too old pacemaker, the default package of CentOS 7 is pacemaker-1.1.21-4.el7.x86_64, while you link of documentation is for Pacemaker 2.0. But never mind, this does not concern the discussed bug. > >> So trivial monitor always exists by default with interval 15. > > nope. This is not true for CentOS 7. I removed my monitor options, for this example. pcs cluster cib original_cib.xml cp original_cib.xml configured_cib.xml pcs -f configured_cib.xml resource create krogan3DB ocf:heartbeat:pgsqlms bindir=/usr/pgsql-11/bin pgdata=/var/lib/pgsql/krogan3 recovery_template=/var/lib/pgsql/krogan3.paf meta master notify=true resource-stickiness=10 pcs -f configured_cib.xml resource create krogan3IP ocf:heartbeat:IPaddr2 nic=eth0 cidr_netmask=24 ip=192.168.89.35 pcs -f configured_cib.xml resource create krogan3s1IP ocf:heartbeat:IPaddr2 nic=eth0 cidr_netmask=24 ip=192.168.89.36 pcs -f configured_cib.xml resource create krogan3s2IP ocf:heartbeat:IPaddr2 nic=eth0 cidr_netmask=24 ip=192.168.89.37 pcs -f configured_cib.xml constraint colocation add krogan3IP with master krogan3DB-master INFINITY pcs -f configured_cib.xml constraint order promote krogan3DB-master then start krogan3IP symmetrical=false pcs -f configured_cib.xml constraint order demote krogan3DB-master then stop krogan3IP symmetrical=false kind=Optional pcs -f configured_cib.xml constraint location krogan3s1IP rule score=-INFINITY master-krogan3DB lt integer 0 pcs -f configured_cib.xml constraint location krogan3s2IP rule score=-INFINITY master-krogan3DB lt integer 0 pcs -f configured_cib.xml constraint colocation add krogan3s1IP with slave krogan3DB-master INFINITY pcs -f configured_cib.xml constraint colocation add krogan3s2IP with slave krogan3DB-master INFINITY pcs -f configured_cib.xml constraint colocation add krogan3s1IP with krogan3s2IP -1000 pcs -f configured_cib.xml constraint order start krogan3DB-master then start krogan3s1IP pcs -f configured_cib.xml constraint order start krogan3DB-master then start krogan3s2IP pcs cluster cib-push configured_cib.xml --wait diff-against=original_cib.xml 13:44:27 j0 root@tuchanka3a:~ # pcs resource show krogan3DB-master Master: krogan3DB-master Meta Attrs: notify=true resource-stickiness=10 Resource: krogan3DB (class=ocf provider=heartbeat type=pgsqlms) Attributes: bindir=/usr/pgsql-11/bin pgdata=/var/lib/pgsql/krogan3 recovery_template=/var/lib/pgsql/krogan3.paf Operations: demote interval=0s timeout=120 (krogan3DB-demote-interval-0s) methods interval=0s timeout=5 (krogan3DB-methods-interval-0s) monitor interval=15 timeout=10 (krogan3DB-monitor-interval-15) monitor interval=16 role=Master timeout=10 (krogan3DB-monitor-interval-16) monitor interval=17 role=Slave timeout=10 (krogan3DB-monitor-interval-17) notify interval=0s timeout=60 (krogan3DB-notify-interval-0s) promote interval=0s timeout=30 (krogan3DB-promote-interval-0s) reload interval=0s timeout=20 (krogan3DB-reload-interval-0s) start interval=0s timeout=60 (krogan3DB-start-interval-0s) stop interval=0s timeout=60 (krogan3DB-stop-interval-0s) > >> My real command >> pcs -f configured_cib.xml resource create krogan2DB ocf:heartbeat:pgsqlms >> bindir=/usr/pgsql-11/bin pgdata=/var/lib/pgsql/krogan2 >> recovery_template=/var/lib/pgsql/krogan2.paf op monitor interval=15 >> timeout=10 monitor interval=16 role=Master timeout=15 monitor interval=17 >> role=Slave timeout=10 meta master notify=true resource-stickiness=10 >> >> Looked like I needed to add all this to change "timeout" parameter for the >> monitor operations and I needed for interval parameter to point on the >> specific monitor operation. > > OK, I understand now. If you want to edit an existing resource, use "pcs > resource update". Make sure read the pcs manual about how to use it to > edit/remove/add operations on a resource. This is not so easy. To edit existed resource I must to know the "interval" of this resource, but in this case I am not sure what the interval will be for the monitor operation of the master role. :) Because >> >> Warning: changing a monitor operation interval from 15 to 16 to make the >> operation unique >> Warning: changing a monitor operation interval from 16 to 17 to make the >> operation unique I am not sure in what order and what it will be. Thats why I configured as I configured. This just works. > >> Looked like the default timeout 10 was not enough for the "master". > > It's written in PAF doc. See: > https://clusterlabs.github.io/PAF/configuration.html#resource-agent-actions > > Do not hesitate to report or submit some enhancements to the doc if needed. May be the documentation was improved. Thanks that you have pointed me on that. After moving to CentOS 8 I will check with recommended parameters according to the documentation. >> It's one of the problem, which you may improve. :) The pacemaker reaction is >> the longest for STOP signal test, usually near 5 minutes. The pacemaker tried >> to make different things (for instance "demote") and wait for different >> timeouts. > > Oh, understood, obviously, I should have thought about that. Well, I will not > be able to improve anything here. You might want to adjust the various operation > timeouts and lower the migration-threshold. Yep. My test bed is rather slow, so I work only with the default timeouts. And this is not a problem for my goal. The optimisation of the timeouts must be done on the production server later. > >>>> 10:30:55.965 FATAL: terminating walreceiver process dpue to administrator >>>> cmd 10:30:55.966 LOG: redo done at 0/1600C4B0 >>>> 10:30:55.966 LOG: last completed transaction was at log time >>>> 10:25:38.76429 10:30:55.968 LOG: selected new timeline ID: 4 >>>> 10:30:56.001 LOG: archive recovery complete >>>> 10:30:56.005 LOG: database system is ready to accept connections >>> >>>> The slave with didn't reconnected replication, tuchanka3c. Also I separated >>>> logs copied from the old master by a blank line: >>>> >>>> [...] >>>> >>>> 10:20:25.168 LOG: database system was interrupted; last known up at >>>> 10:20:19 10:20:25.180 LOG: entering standby mode >>>> 10:20:25.181 LOG: redo starts at 0/11000098 >>>> 10:20:25.183 LOG: consistent recovery state reached at 0/11000A68 >>>> 10:20:25.183 LOG: database system is ready to accept read only connections >>>> 10:20:25.193 LOG: started streaming WAL from primary at 0/12000000 on tl 3 >>>> 10:25:05.370 LOG: could not send data to client: Connection reset by peer >>>> 10:26:38.655 FATAL: terminating walreceiver due to timeout >>>> 10:26:38.655 LOG: record with incorrect prev-link 0/1200C4B0 at >>>> 0/1600C4D8 >>> >>> This message appear before the effective promotion of tuchanka3b. Do you >>> have logs about what happen *after* the promotion? >> >> This is end of the slave log. Nothing. Just absent replication. > > This is unusual. Could you log some more details about replication > tryouts to your PostgreSQL logs? Set log_replication_commands and lower > log_min_messages to debug ? Sure, this is PostgreSQL logs for the cluster tuchanka3. Tuchanka3a is an old (failed) master.
Attachment:
tuchanka3a.log.xz
Description: Binary data
Tuchanka3b is a new (promoted) master.
Attachment:
tuchanka3b.log.xz
Description: Binary data
Tuchanka3c is a slave that has lost replication.
Attachment:
tuchanka3c.log.xz
Description: Binary data
>>> That's why I'm wondering how you built your standbys, from scratch? >> >> By special scripts. :) This project already on GitHub and I am waiting for >> the final solution of my boss to open it. And it will take some time to >> translate README to English. After this I'll link the repository here. > > I'll give it a look and try to reproduce if I find some time. Heh, may be you are the only man who may find this project useful. :) I hope you have got my last email with description how to reveal this bug in the test bed and the screenshot of the result. Here is a new updated patch for this: diff --git a/test/failure1 b/test/failure1 index d81b9c8..9cc5dc6 100755 --- a/test/failure1 +++ b/test/failure1 @@ -105,15 +105,15 @@ function OutOfSpace { readonly -f OutOfSpace # Here can be commented out unnessesary tests -break_node+=('Reset') -break_node+=('PowerOff') -break_node+=('ShutDown') -break_node+=('UnLink') -break_node+=('Postgres-KILL') +#break_node+=('Reset') +#break_node+=('PowerOff') +#break_node+=('ShutDown') +#break_node+=('UnLink') +#break_node+=('Postgres-KILL') break_node+=('Postgres-STOP') -break_node+=('SBD-STOP') -break_node+=('ForkBomb') -break_node+=('OutOfSpace') +#break_node+=('SBD-STOP') +#break_node+=('ForkBomb') +#break_node+=('OutOfSpace') readonly -a break_node # Setup tmux panes @@ -137,9 +137,9 @@ function test_node { local f=$1 local h unbroken time # random node from the cluster - h=$(random_word ${cluster_vms[$c]}) + # h=$(random_word ${cluster_vms[$c]}) # Can be used to test only the first node in the cluster - # h=$(first_word ${cluster_vms[$c]}) + h=$(first_word ${cluster_vms[$c]}) for unbroken in ${cluster_vms[$c]} do if [ $unbroken -ne $h ] diff --git a/upload/common/postgresql.conf b/upload/common/postgresql.conf index 1079e4b..a31b23d 100644 --- a/upload/common/postgresql.conf +++ b/upload/common/postgresql.conf @@ -29,11 +29,11 @@ restart_after_crash = off # пусть решает pacemaker pgsqlms # Disable wal_receiver_timeout, because with default wal_receiver_timeout there is a bug # in PostgreSQL, which breaks recconection of the replication on PostgreSQL-STOP test. # https://www.postgresql.org/message-id/60590EC6-4062-4F25-A49C-3948ED2A7D47%40ya.ru -wal_receiver_timeout=0 +#wal_receiver_timeout=0 # экономлю ОЗУ виртуалок shared_buffers = 32MB # экономлю на виртуальных винчестерах, поставил равным min_wal_size max_wal_size=80MB # This 2 options are to debug PostgreSQL replication -#log_replication_commands = on -#log_min_messages = debug +log_replication_commands = on +log_min_messages = debug