On 07/03/2016 11:04 AM, Adrian Klaver wrote:
On 07/03/2016 08:49 AM, Andy Colson wrote:
On 07/03/2016 10:35 AM, Adrian Klaver wrote:
On 07/03/2016 08:06 AM, Andy Colson wrote:
Hi all,
I have a master (web1) and two slaves (web2, webserv), one slave is
quite far from the master, the db is 112 Gig, so pg_basebackup is my
last resort.
I followed the page here:
https://www.postgresql.org/docs/9.5/static/pgupgrade.html
including the rsync stuff. I practiced it _twice_, once in PG 9.5 beta,
and again a week ago, on two VM's I created locally. Both practice
sessions worked perfect.
I just ran it on the live databases. The master seems ok, its running
PG 9.5 now, I can login to it, and no errors in the log.
Neither slave works. After I'd gotten done with the pgupgrade steps,
both slaves gave me this error:
FATAL: database system identifier differs between the primary and
standby
Sure enough pg_controldata show'd their database system id different
(all three web1, web2, webserv were different. no matches at all), so
I'm assuming the rsync didnt rsync right, or I missed a step and ran it
to early, or something ... I'm not quite sure.
I needed to get the live website back up and running again, so I let the
master go, ran analyze, and when it was finished, used the steps here to
try and resync:
https://wiki.postgresql.org/wiki/Binary_Replication_Tutorial
on Master:
select pg_start_backup('clone',true);
rsync -av --exclude pg_xlog --exclude postgresql.conf /pub/pg95/*
web2:/pub/pg95/
select pg_stop_backup();
rsync -av /pub/pg95/pg_xlog web2:/pub/pg95/
Not sure about above rsync, that seems to undo what you did previously.
Also was the remote directory empty when you did this?
Not sure what you mean by undo. pgupgrade.html page, step 10, has you
rsync the master to the slave, so the pg95 directory is hard linked to
the pg93, which save's a ton to bandwidth when your servers are cross
county.
I understand I am just trying to figure out what mixing methods (pg-upgrade, pg_start_backup) is doing?
In particular the section on pg_start_backup:
https://www.postgresql.org/docs/9.5/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP
starts with:
"Ensure that WAL archiving is enabled and working."
and from I gather that is not the case.
The slave log says it reached a consistent state, and is accepting connections, which tells me it should run ok, maybe not with the newest data, but at least not get:
FATAL: cache lookup failed for database 16401
Also was the remote directory empty when you did this?
Now that I think about this more, I think you're on to something. I'm trying to get an _exact_ copy of the master db onto the slave. Checking rsync man, it matches only on size and modified time, and I didn't include deletes.
I'm going to re-try with this:
select pg_start_backup('clone',true);
rsync -av --delete --checksum --exclude pg_xlog --exclude postgresql.conf /pub/pg95/* web2:/pub/pg95/
select pg_stop_backup();
rsync -av --delete --checksum /pub/pg95/pg_xlog web2:/pub/pg95/
That should make sure the copies are exact.
-Andy
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general