> 21 авг. 2020 г., в 17:26, Jehan-Guillaume de Rorthais <jgdr@xxxxxxxxxx> написал(а): > > On Thu, 20 Aug 2020 15:16:10 +0300 > Based on setup per node, you can probably add > 'synchronous_commit=remote_write' in the common conf. Nope. I set 'synchronous_commit=remote_write' only for 3 and 4 node clusters. Two node clusters don't have this setting (use default), because they don't have the synchronous commit. > >> [...] >> pacemaker config, specific for this cluster: >> [...] > > why did you add "monitor interval=15"? No harm, but it is redundant with > "monitor interval=16 role=Master" and "monitor interval=17 role=Slave". I can't remember clearly. :) Look what happens without it. + pcs -f configured_cib.xml resource create krogan2DB ocf:heartbeat:pgsqlms bindir=/usr/pgsql-11/bin pgdata=/var/lib/pgsql/krogan2 recovery_template=/var/lib/pgsql/krogan2.paf meta master notify=true resource-stickiness=10 Warning: changing a monitor operation interval from 15 to 16 to make the operation unique Warning: changing a monitor operation interval from 16 to 17 to make the operation unique So trivial monitor always exists by default with interval 15. My real command pcs -f configured_cib.xml resource create krogan2DB ocf:heartbeat:pgsqlms bindir=/usr/pgsql-11/bin pgdata=/var/lib/pgsql/krogan2 recovery_template=/var/lib/pgsql/krogan2.paf op monitor interval=15 timeout=10 monitor interval=16 role=Master timeout=15 monitor interval=17 role=Slave timeout=10 meta master notify=true resource-stickiness=10 Looked like I needed to add all this to change "timeout" parameter for the monitor operations and I needed for interval parameter to point on the specific monitor operation. Looked like the default timeout 10 was not enough for the "master". > By the way, nice catch to exclude RO IP when master score is > negative. > > You should probably consider putting your logs outside of your PGDATA, or > even better: send your PostgreSQL/Pacemaker logs to a dedicated syslog node > outside. Adding the hostname in the prefix of each log line might probably help > a lot. I have put this into my ToDo notebook. >> [...] >> 10:24:55.906 LOG: entering standby mode >> 10:24:55.908 LOG: redo starts at 0/15000028 >> 10:24:55.909 LOG: consistent recovery state reached at 0/15002300 >> 10:24:55.910 LOG: database system is ready to accept read only connections >> 10:24:55.928 LOG: started streaming WAL from primary at 0/16000000 on tl 3 >> 10:26:37.308 FATAL: terminating walreceiver due to timeout > > Timeout because of SIGSTOP on primary here. Sure >> 10:26:37.308 LOG: invalid record length at 0/1600C4D8: wanted 24, got 0 >> 10:30:55.965 LOG: received promote request > > Promotion from Pacemaker here. Yep > What happened during more than 4 minutes between the timeout and the promotion? It's one of the problem, which you may improve. :) The pacemaker reaction is the longest for STOP signal test, usually near 5 minutes. The pacemaker tried to make different things (for instance "demote") and wait for different timeouts. > >> 10:30:55.965 FATAL: terminating walreceiver process due to administrator cmd >> 10:30:55.966 LOG: redo done at 0/1600C4B0 >> 10:30:55.966 LOG: last completed transaction was at log time 10:25:38.76429 >> 10:30:55.968 LOG: selected new timeline ID: 4 >> 10:30:56.001 LOG: archive recovery complete >> 10:30:56.005 LOG: database system is ready to accept connections > >> The slave with didn't reconnected replication, tuchanka3c. Also I separated >> logs copied from the old master by a blank line: >> >> [...] >> >> 10:20:25.168 LOG: database system was interrupted; last known up at 10:20:19 >> 10:20:25.180 LOG: entering standby mode >> 10:20:25.181 LOG: redo starts at 0/11000098 >> 10:20:25.183 LOG: consistent recovery state reached at 0/11000A68 >> 10:20:25.183 LOG: database system is ready to accept read only connections >> 10:20:25.193 LOG: started streaming WAL from primary at 0/12000000 on tl 3 >> 10:25:05.370 LOG: could not send data to client: Connection reset by peer >> 10:26:38.655 FATAL: terminating walreceiver due to timeout >> 10:26:38.655 LOG: record with incorrect prev-link 0/1200C4B0 at 0/1600C4D8 > > This message appear before the effective promotion of tuchanka3b. Do you have > logs about what happen *after* the promotion? This is end of the slave log. Nothing. Just absent replication. > Reading at this error, it seems like record at 0/1600C4D8 references the > previous one in WAL 0/12000000. So the file referenced as 0/16 have either > corrupted data or was 0/12 being recycled, but not zeroed correctly, as v11 > always do no matter what (no wal_init_zero there). Okey, may be in v12 it will be fixed. > That's why I'm wondering how you built your standbys, from scratch? By special scripts. :) This project already on GitHub and I am waiting for the final solution of my boss to open it. And it will take some time to translate README to English. After this I'll link the repository here. > > [...] >> >>>> Well, this can be workarouned by turning off wal_receiver_timeout. > > Just because the walreceiver keeps waiting on its socket and don't pay > attention to data on disk I suppose. I suppose because it don't pay attention on the absents of the data, and don't finish the connection by timeout. But it may finish it in other way by other reason to reconnect to a new master later. > >>>> Now PostgreSQL-STOP works fine, but the problem is still >>>> exists with other test. ForkBomb simulates an out of memory situation. In >>>> this case a slave sometimes don't reconnect to the new master too, with >>>> errors in log: >>>> >>>> 10:09:43.99 [1417] FATAL: could not receive data from WAL stream: server >>>> closed the connection unexpectedly This probably means the server >>>> terminated abnormally before or while processing the request. >>>> 10:09:43.992 [1413] LOG: invalid record length at 0/D8014278: wanted 24, >>>> got 0 >>> >>> I suspect the problem is somewhere else. The first message here is probably >>> related to your primary being fenced, the second one is normal. After your >>> IP moved to the recently promoted primary, your standby are supposed to >>> reconnect with no problem. >> >> I believe that problem with the ForkBomb has the same nature. The case with >> PostgreSQL-STOP and default wal_receiver_timeout can be easily repeated, it >> rises always by `killall -STOP postgres` on the master, as I can see. But the >> case with wal_receiver_timeout=0 and ForkBomb happens rather rare, not always. > > because of resources pressure, I suppose the OOM killer behavior is not > predictable. Sometime it will kill your walsender, sometime not. And sometime, > the watchdog itself is supposed to kill the node before anyone else. I think will be better first of all to fix PostgreSQL STOP test, because it is easily repeatable. And after this I'll test the case with ForkBomb, one day will be enough.