Re: BDR node removal and rejoin

Craig Ringer <craig@xxxxxxxxxxxxxxx> · Thu, 13 Jul 2017 14:58:34 +0800

On 13 July 2017 at 01:56, Zhu, Joshua <jzhu@xxxxxxxxxxxxx> wrote:

Thanks for the clarification.

Looks like I am running into a different issue: while trying to pin down precisely the steps (and the order in which to perform them) needed to remove/rejoin a node, the removal/rejoining
 exercise was repeated a number of times, and stuck again:

The status of the re-joining node (node4) on other nodes is “I”
The status of the re-joining node on the node4 itself started at “I”, changed to “o”, then stuck there
From the log file for node4, the following entries are constantly being generated:

2017-07-12 10:37:46 PDT [24943:bdr (6334686800251932108,1,43865,):receive:::1(33883)]DEBUG:  00000: received replication command: IDENTIFY_SYSTEM
2017-07-12 10:37:46 PDT [24943:bdr (6334686800251932108,1,43865,):receive:::1(33883)]LOCATION:  exec_replication_command, walsender.c:1309
2017-07-12 10:37:46 PDT [24943:bdr (6334686800251932108,1,43865,):receive:::1(33883)]DEBUG:  08003: unexpected EOF on client connection
2017-07-12 10:37:46 PDT [24943:bdr (6334686800251932108,1,43865,):receive:::1(33883)]LOCATION:  SocketBackend, postgres.c:355
2017-07-12 10:37:46 PDT [24944:bdr (6408408103171110238,1,24713,):receive:::1(33884)]DEBUG:  00000: received replication command: IDENTIFY_SYSTEM
2017-07-12 10:37:46 PDT [24944:bdr (6408408103171110238,1,24713,):receive:::1(33884)]LOCATION:  exec_replication_command, walsender.c:1309
2017-07-12 10:37:46 PDT [24944:bdr (6408408103171110238,1,24713,):receive:::1(33884)]DEBUG:  08003: unexpected EOF on client connection

Check the logs on the other end.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services