On Wed, Feb 29, 2012 at 3:22 AM, Jameison Martin <jameisonb@xxxxxxxxx> wrote: > i don't think i've explained things very clearly. the implied contradiction > is that i'd be using asynchronous replication to catch up a slave after a > slave failure and thus i'm losing the transactional consistency that i > suggest i need. if a slave fails and is brought back on line i am indeed > proposing that it catch up with the master asynchronously; however, the > slave wouldn't be promoted to a hot standby until it is completely caught up > and could be reestablished as a synchronous replica (at least that is what > i'd like to do in theory). so i'm proposing that a slave would never be a > candidate for a HA failover unless it is completely in sync with a master: > if there is no slave that is in sync with the master at the time the master > fails, then the master would have to be recovered from the filesystem via > traditional recovery. the fact that i envision 'catching up' a slave to a > master using asychronous replication is not particularly relevant to the > transactional guarantees of the system as a whole if the slave is > effectively unavailable while catching up. > > similarly, any slave that isn't caught up to its master would also not be > eligible for queries. > > i can understand why the master might hang when there is no reachable > replica during synchronous commit, this is exactly the right thing to do if > you want to guarantee that you have at least 2 distinct spheres of > durability. but i'd prefer to sacrifice the extra durability guarantee in > favor of availability in this case given that recovery from the file system > is still an option should the master subsequently fail. my availability > issue is that the master would clearly be hung/unavailable for an unbounded > amount of time without a strong guarantee about the time it might take to > bring a replica back up which is not acceptable in my case. > > if the master hangs commits because there is no active slave, i believe that > an administrator would have to > > detect that there are no active slaves > shut the master down > disable synchronous replication > bring the master back up You don't need to restart the server when you disable sync replication. You can do that by emptying synchronous_standby_names in postgresql.conf and reloading it (i.e., pg_ctl reload). BTW, though you can disable sync replication by setting synchronous_commit to local in postgresql.conf, you should use synchronous_standby_names for that purpose instead. Setting synchronous_commit to local can prevent new transactions (which are executed after setting synchronous_commit to local) from being blocked, but cannot resume the already-blocking transactions. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general