On 14 July 2017 at 00:09, Zhu, Joshua <jzhu@xxxxxxxxxxxxx> wrote:
Found these log entries from one of the other node:
t=2017-07-13 08:35:34 PDT p=27292 a=DEBUG: 00000: found valid replication identifier 15
t=2017-07-13 08:35:34 PDT p=27292 a=LOCATION: bdr_establish_connection_and_
slot, bdr.c:604 t=2017-07-13 08:35:34 PDT p=27292 a=ERROR: 53400: no free replication state could be found for 15, increase max_replication_slots
Increased max_replication_slots, things are looking good now, thanks.
This does bring up a couple of questions:
- Given the fact there is no real increase in the number of nodes in this repeated removal/rejoining exercise, yet it caused replication slots being used up, wouldn’t removal of a node also automatically free up the replication slot allocated for the node?
Yes, it should. Open issue. A patch would be welcomed.
- Or is there a way to manually free up no longer needed slots? (the don’t seem to show up in pg_replication_slots view, I made sure to use pg_drop_replication_slot when they do show up there)
It'll be complaining about replication identifiers ("origins" in 9.6); see pg_replication_identifier
- If there is such a thing, what is the rule of thumb for best value of max_replication_slots (are they somehow related to the value max_wal_senders as well), with respect to, say, the max number of nodes intended to support?
I think that's covered in the docs, but it's safe to err fairly high. The cost of extra slots is minimal.