On 17 March 2015 at 20:33, Deole, Pushkar (Pushkar) <pdeole@xxxxxxxxx> wrote:
-- The documentation says that all the existing nodes need to be restarted while adding a new node since the existing nodes need to establish connection to the new node.
It sounds like you're talking about BDR here.
If so, that requirement will no longer apply in the coming 0.9.0 release. Nodes may be added without restarting existing nodes. You can try it out if you like, grab the code in the bdr-plugin/next branch from git.
However, this doesn’t seem feasible for production deployments because existing nodes might be serving clients which would fail if we need to restart them.
(The following isn't BDR specific at all and applies to Pg HA in general):
If a client cannot cope with a backend disconnecting then that's a buggy client. Yes, that makes most clients out there buggy.
It's very common - and wrong - programming practice to assume that if a connection is usable when you start a transaction then it'll stay that way. Also to assume that if you don't do anything that could cause a transaction abort (SERIALIZABLE isolation, etc) then one won't occur. That's just setting yourself up for problems, because your app will fail visibly (errors to user, etc) if you restart the DB server, cancel a query for load reasons, the DB does crash-recovery after a backend panic/crash, your application deadlocks with its self or something else and triggers the deadlock detector, etc.
IMO apps should always do their work in a retry loop. If the tx is aborted for some reason, retry the whole tx. If the connection is closed, reconnect and retry. Only if you can't complete the work after a few tries and a timeout should you give up with an error.
If your apps aren't written that way then you clearly don't mind showing errors to the user that much after all.
Is there a mechanism a new node gets added on the fly ?
It's been added in the coming 0.9.0 release.