Re: 9.3 to 9.5 upgrade problems

Vick Khera <vivek@xxxxxxxxx> · Sun, 3 Jul 2016 11:11:45 -0400

binary replication requires the versions be identical. Also, once you ran pg_upgrade you altered one of the copies so binary replication can no longer work on that either.

On Sun, Jul 3, 2016 at 11:06 AM, Andy Colson <andy@xxxxxxxxxxxxxxx> wrote:
Hi all,

I have a master (web1) and two slaves (web2, webserv), one slave is quite far from the master, the db is 112 Gig, so pg_basebackup is my last resort.

I followed the page here:

https://www.postgresql.org/docs/9.5/static/pgupgrade.html

including the rsync stuff.  I practiced it _twice_, once in PG 9.5 beta, and again a week ago, on two VM's I created locally.  Both practice sessions worked perfect.

I just ran it on the live databases.  The master seems ok, its running PG 9.5 now, I can login to it, and no errors in the log.

Neither slave works.  After I'd gotten done with the pgupgrade steps, both slaves gave me this error:

FATAL:  database system identifier differs between the primary and standby

Sure enough pg_controldata show'd their database system id different (all three web1, web2, webserv were different.  no matches at all), so I'm assuming the rsync didnt rsync right, or I missed a step and ran it to early, or something ... I'm not quite sure.

I needed to get the live website back up and running again, so I let the master go, ran analyze, and when it was finished, used the steps here to try and resync:

https://wiki.postgresql.org/wiki/Binary_Replication_Tutorial

on Master:

select pg_start_backup('clone',true);

rsync -av --exclude pg_xlog --exclude postgresql.conf /pub/pg95/* web2:/pub/pg95/

select pg_stop_backup();

rsync -av /pub/pg95/pg_xlog web2:/pub/pg95/

That ran pretty quick, and pg_controldata shows matching numbers, but when I start the slave I get:

,,2016-07-03 06:06:57.173 CDT,: LOG:  entering standby mode

,,2016-07-03 06:06:57.205 CDT,: LOG:  redo starts at 369/D6002228

,,2016-07-03 06:06:57.984 CDT,: LOG:  consistent recovery state reached at 369/DCC5DB90

,,2016-07-03 06:06:57.984 CDT,: LOG:  database system is ready to accept read only connections

,,2016-07-03 06:06:57.984 CDT,: LOG:  invalid record length at 369/DD038ED0

,,2016-07-03 06:06:58.344 CDT,: LOG:  started streaming WAL from primary at 369/DD000000 on timeline 1

web,[unknown],2016-07-03 06:07:11.176 CDT,[local]: FATAL:  role "andy" does not exist

I can login as myself on the master, but not on the slave.  when I "psql -U postgres" on the slave I get:

psql: FATAL:  cache lookup failed for database 16401

This is only on web2, its close to web1, so I'm hoping I can get it fixed and then rsync it quickly to the far away slave.

I'm at a loss here, any hints or suggestions would be appreciated.

Thanks,

-Andy

-- 

Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-general