Re: BDR node removal and rejoin

Craig Ringer <craig@xxxxxxxxxxxxxxx> · Fri, 14 Jul 2017 12:46:55 +0800

On 14 July 2017 at 00:09, Zhu, Joshua <jzhu@xxxxxxxxxxxxx> wrote:

Found these log entries from one of the other node:

t=2017-07-13 08:35:34 PDT p=27292 a=DEBUG:  00000: found valid replication identifier 15
t=2017-07-13 08:35:34 PDT p=27292 a=LOCATION:  bdr_establish_connection_and_slot, bdr.c:604
t=2017-07-13 08:35:34 PDT p=27292 a=ERROR:  53400: no free replication state could be found for 15, increase max_replication_slots

Increased max_replication_slots, things are looking good now, thanks.

This does bring up a couple of questions:

Given the fact there is no real increase in the number of nodes in this repeated removal/rejoining exercise, yet it
 caused replication slots being used up, wouldn’t removal of a node also automatically free up the replication slot allocated for the node? 

Yes, it should. Open issue. A patch would be welcomed.

Or is there a way to manually free up no longer needed slots? (the don’t seem to show up in pg_replication_slots view,
 I made sure to use pg_drop_replication_slot when they do show up there)

It'll be complaining about replication identifiers ("origins" in 9.6); see pg_replication_identifier

If there is such a thing, what is the rule of thumb for best value of max_replication_slots (are they somehow related
 to the value max_wal_senders as well), with respect to, say, the max number of nodes intended to support?

I think that's covered in the docs, but it's safe to err fairly high. The cost of extra slots is minimal.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services