Re: postgres hot-standby questions.

"Graeme B. Bell" <grb@xxxxxxxxxxxxxxxxx> · Thu, 26 Mar 2015 15:43:08 +0000

On 26 Mar 2015, at 16:07, Scott Ribe <scott_ribe@xxxxxxxxxxxxxxxx> wrote:

> On Mar 26, 2015, at 8:17 AM, Graeme B. Bell <grb@xxxxxxxxxxxxxxxxx> wrote:
>> 
>> shutdown A (Master)
>> shutdown B (Standby)
>> (A and B  should be identical in terms of WAL and logical data at this point).
>> swap M/S configurations around
>> start B (Master)
> 
> First, note that you don't actually have to shut down B and swap master/slave configs. You can use trigger_file in recovery.conf.

Thanks Simon, that's a good suggestion. 

Do you know if using the trigger file makes any change to the DB or WAL which couldn't be replicated on the master when it comes back up? 

Actually, it's probably worth mentioning a bit more information about my use case, for anyone reading.

I am concerned about being able to do a clean, safe, quick swap *and soon after, a quick/clean/safe swap back again* without having to checksum the entire DB (potentially TBs of data) to be certain I'm not breaking something or losing something on the return trip.

Typically we have these big DBs but we want to make some little change which only takes 10-30 minutes. 

This is something that pg_rewind may help to address in 9.5, but I'd like to be able to do it now and we can't risk using pg_rewind in our production environment at present. 

Actually even with pg_rewind we have the tricky problem that a user could have a transaction accepted on the master, which never reaches the slave if you failover without a synchronous shutdown process. Such transactions could be rewound on the master DB with pg_rewind to allow it to become a slave, but we can't exactly email the users to let them know their completed transaction wasn't durable after all.

> In general, with async replication, you don't know that every last transaction is replicated at any moment. (And of course with synch replication, you add points of failure.)

With fully synchronous replication the performance issue is a real killer for us. 
I suppose what I'm wondering is about async normal operation, but sync operation during shutdown. 

It might seem a strange compromise but in practice something like 95% of the time when the server goes down it's a controlled downtime not a failure. So we would benefit from sync during controlled shutdown often but almost never during ordinary operation. So the cost/benefit for fully sync operation is terrible for us but the cost/benefit of sync-on-shutdown is huge.

I could try to script it using e.g. pg_last_xlog_receive_location,  pg_last_xlog_replay_location, but that doesn't help so much when the DB decides to complete shutdown while the packet is still on the wire, and the packet goes missing. 

> I don't actually know the answer to your questions, because in my use, the connection between server & replica is high-bandwidth low-latency. My routine is to shut down all services that access the db, then shut down the db. The progression of service shutdown pretty much guarantees that if the connection is up, the replica is up to date well before the master is shut down. So all I have to do is, after access is shut down use one of the many methods to check replication lag, just as a double-check that replication was working.
> 
> It does seem to me it would be a good idea to 1) document explicitly in 17.5 "Shutting Down the Server" what happens with async replication (I suspect all your guesses are correct) and 2) provide some option to wait for replication to finish.
> 
> The thing about 2 is that we probably don't want it to be automatic or default, because the nature of asynch replicas is that it's not uncommon to have ones far away, over slower less-reliable links, which may take an unknown amount of time to come up to date. It's not uncommon to want to bring the master down for a point update or some config change, and immediately bring it back up, in which case we don't want the shutdown delayed. It's also not uncommon to have multiple replicas, some close by that should probably be always up-to-date, and some far away for protection against big disasters, which may lag. So the "wait for replica" option to shut down would need a way to specify *which* replica/s we wanted to wait for.

I agree, a non-default option to do sync-shutdown sounds like what I'm thinking of. Good points about the multi-standby problem.

Graeme Bell

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin