Re: Recovery - New Slave PostgreSQL 9.2

John Scalia <jayknowsunix@xxxxxxxxx> · Sat, 9 Jan 2016 17:16:21 -0600

I'd recommend that you'd specify -X s, as just specifying -X or
-xiog gives you the default value of fetch rather than stream. Also, from your current WAL directory listing that you just provided, that's indicating that your server's timelines are far different.

Now, you're saying that one system went down, which is why you're trying to do this, but was the first slave that failed? Or did your primary fail? That would possibly explain why the timelines are different. If your primary failed and this standby assumed command, then its timeline would have incremented. So, if you're trying to put this one back as a slave, that's not a really trivial process. You'd have to set the old primary back up a slave to the current primary, and then execute another failover, this time back to your original primary, and then rebuild all the slaves all over. 

Just saying,
Jay

Sent from my iPad

On Jan 9, 2016, at 3:48 PM, "drum.lucas@xxxxxxxxx" <drum.lucas@xxxxxxxxx> wrote:

Hi John,
First, when you built the slave server, I'm assuming you used pg_basebackup and if you did, did you specify -X s in your command?

Yep. I ran the pg_basebackup into the new slave from ANOTHER SLAVE... 
ssh postgres@slave1 'pg_basebackup --pgdata=- --format=tar --label=bb_master --progress --host=localhost --port=5432 --username=replicator --xlog | pv --quiet --rate-limit 100M' | tar -x --no-same-owner

-X = --xlog

On my new Slave, I've got all the wall archives. (The master copies the wal at all the time...)
ls /var/lib/pgsql/9.2/wal_archive:
0000000200000C6A0000002D
0000000200000C6A0000002E

and not 
../wal_archive/0000000400000C68000000C8` not found
../wal_archive/00000005.history` not found

Remember that I'm trying to do a cascading replication (It was working with another slave. But the server went down and I'm trying to set up a new one)

I would suggest, in spite of of the 2TB size, rebuilding the standby servers with a proper pg_basebackup.

I've already ran the pg_basebackup over than once. And I always get the same error... :(

Is there anything else guys? please,, help hehehhe

Lucas Possamai
kinghost.co.nz

On 10 January 2016 at 10:33, John Scalia <jayknowsunix@xxxxxxxxx> wrote:
Hi,

I'm a little late to this thread, but in looking at the errors you originally posted, two things come to mind:

First, when you built the slave server, I'm assuming you used pg_basebackup and if you did, did you specify -X s in your command?

Second, the missing history file isn't an issue, in case you're unfamiliar with this. However, yeah, the missing WAL segment is, as well as the bad timeline error.  Is that missing segment still on your primary?  You know you could just copy it manually to your standby and start from that. As far as the timeline error, that's disturbing to me as it's claiming the primary is actually a failed over standby. AFAIK, that's the main if not only way transaction timelines increment.

I would suggest, in spite of of the 2TB size, rebuilding the standby servers with a proper pg_basebackup.
--
Jay

Sent from my iPad

On Jan 9, 2016, at 2:19 PM, "drum.lucas@xxxxxxxxx" <drum.lucas@xxxxxxxxx> wrote:

Hi, thanks for your reply... I've been working on this problem for 20h =(
# cat postgresql.conf | grep synchronous_standby_names
#synchronous_standby_names = '' - It's commented 

# cat postgresql.conf |  grep application_name
log_line_prefix = '%m|%p|%q[%c]@%r|%u|%a|%d '
( %a = application name )

I can't resyc all the DB again, because it has 2TB of data :(

Is there anything else I can do?
Thank you

Lucas Possamai
kinghost.co.nz

On 10 January 2016 at 04:22, Shreeyansh Dba <shreeyansh2014@xxxxxxxxx> wrote:

On Sat, Jan 9, 2016 at 3:28 PM, drum.lucas@xxxxxxxxx <drum.lucas@xxxxxxxxx> wrote:
My recovery was like that!I was already using that way.. I still have the problem =\

Is there anything I can do?

Lucas Possamai
kinghost.co.nz

On 9 January 2016 at 22:53, Shreeyansh Dba <shreeyansh2014@xxxxxxxxx> wrote:

Hi Lucas,

Yes , now recovery.conf looks good.
Hope this solve you problem.

Thanks and regards,
ShreeyanshDBA Team
Shreeyansh Technologies
www.shreeyansh.com

On Sat, Jan 9, 2016 at 3:07 PM, drum.lucas@xxxxxxxxx <drum.lucas@xxxxxxxxx> wrote:
Hi there!
Yep, it's correct: 
It looks like You have a set up A (Master) ---> B (Replica) ---> C Replica (Base backup from Replica B)

Master (A): 192.168.100.1
Slave1 (B): 192.168.100.2
Slave2 (C): 192.168.100.3

My recovery.conf in slave2(C) is:
restore_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/restore_wal_segment.bash "../wal_archive/%f" "%p"'
archive_cleanup_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/pg_archivecleaup_mv.bash -d "../wal_archive" "%r"'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.100.2 port=5432 user=replicator application_name=replication_slave02'
So, seems to be right to me... Is that u mean?

Thanks

Lucas Possamai
kinghost.co.nz

On 9 January 2016 at 22:25, Shreeyansh Dba <shreeyansh2014@xxxxxxxxx> wrote:
On Sat, Jan 9, 2016 at 8:29 AM, drum.lucas@xxxxxxxxx <drum.lucas@xxxxxxxxx> wrote:
* NOTE: I ran the pg_basebackup from another STANDBY SERVER. Not from the MASTER

Lucas Possamai
kinghost.co.nz

On 9 January 2016 at 15:28, drum.lucas@xxxxxxxxx <drum.lucas@xxxxxxxxx> wrote:
Still trying to solve the problem...Anyone can help please?

Lucas

Lucas Possamai
kinghost.co.nz

On 9 January 2016 at 14:45, drum.lucas@xxxxxxxxx <drum.lucas@xxxxxxxxx> wrote:
Sure... Here's the total information:http://superuser.com/questions/1023770/new-postgresql-slave-server-error-timeline

recovery.conf:
restore_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/restore_wal_segment.bash "../wal_archive/%f" "%p"'
archive_cleanup_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/pg_archivecleaup_mv.bash -d "../wal_archive" "%r"'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.100.XX port=5432 user=replicator application_name=replication_new_slave'

Lucas Possamai
kinghost.co.nz

On 9 January 2016 at 14:37, Ian Barwick <ian@xxxxxxxxxxxxxxx> wrote:
On 16/01/09 9:23, drum.lucas@xxxxxxxxx wrote:

> Hi all!

>

> I've done the pg_basebackup from the live to a new slave server...

>

> I've recovery the wal files, but now that I configured to replicate from the master (recovery.conf) I got this error:

>

> ../wal_archive/0000000400000C68000000C8` not found

> ../wal_archive/00000005.history` not found

>

> FATAL:  timeline 2 of the primary does not match recovery target timeline 1

Can you post the contents of your recovery.conf file, suitably

anonymised if necessary?

Regards

Ian Barwick

Hi Lucas,

I followed your question I generated the same error:

cp: cannot stat `/pgdata/arch/00000003.history': No such file or directory
2016-01-09 14:11:42 IST FATAL:  timeline 1 of the primary does not
match recovery target timeline 2

It looks like You have a set up A (Master) ---> B (Replica) ---> C Replica (Base backup from Replica B)

It seems you have used recovery.conf (to replicate from master to slave) to new replica setup C and there is high probability not changing the primary connection info
in C's recovery.conf (Replica B's Connection info)

During testing providing B's connection info in C's recovery.conf resolved the issue.

Please verify the Primary connection info parameter in recovery.conf (C replica) might resolve your problem.

Thanks and regards,
ShreeyanshDBA Team
Shreeyansh Technologies
www.shreeyansh.com 

Hi Lucas,
It looks like application_name parameter that set in recovery.conf may mismatch. 
Please verify the value to synchronous_standby_names  value set in the postgresql.conf of Replica - C and the value that using as application_name in recovery.conf

Also, check whether the Async replication works with out using application_name in recovery.conf of replica -C and check the status in pg_stat_replication catalog table.

Thanks and regards
ShreeyanshDBA Team
Shreeyansh Technologies
www.shreeyansh.com