Re: URGENT issue: pg-xlog growing on master!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






On Mon, Jun 10, 2013 at 12:35 PM, Niels Kristian Schjødt <nielskristian@xxxxxxxxxxxxx> wrote:

Den 10/06/2013 kl. 16.36 skrev bricklen <bricklen@xxxxxxxxx>:

On Mon, Jun 10, 2013 at 4:29 AM, Niels Kristian Schjødt <nielskristian@xxxxxxxxxxxxx> wrote:

2013-06-10 11:21:45 GMT FATAL:  could not connect to the primary server: could not connect to server: No route to host
                Is the server running on host "192.168.0.4" and accepting
                TCP/IP connections on port 5432?

Did anything get changed on the standby or master around the time this message started occurring?
On the master, what do the following show?
show port;
show listen_addresses;

The master's IP is still 192.168.0.4?

Have you tried connecting to the master using something like:
psql -h 192.168.0.4 -p 5432 -U postgres -d postgres
 
Does that throw a useful error or warning?


It turned out that the switch port that the server was connected to was faulty, and hence no successful connection between master and slave was established. This resolved in pg_xlog building up very fast, because our system performs a lot of changes on the data we store. 

I ended up running pg_archivecleanup on the master to get some space freed urgently. Then I got the switch changed with a new one. Now I'm trying to the streaming replication setup from scratch again, but with no luck.

I can't seem to figure out which steps I need to do, to get the standby server wiped and get it started as a streaming replication again from scratch. I tried to follow the steps, from step 6, in here http://wiki.postgresql.org/wiki/Streaming_Replication but the process seems to fail when I reach the point where I try to do a psql -c "SELECT pg_stop_backup()". It just says:

NOTICE:  pg_stop_backup cleanup done, waiting for required WAL segments to be archived
WARNING:  pg_stop_backup still waiting for all required WAL segments to be archived (60 seconds elapsed)
HINT:  Check that your archive_command is executing properly.  pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.
(...)

When looking at ps aux on the master, I see the following:

postgres 30930  0.0  0.0  98412  1632 ?        Ss   15:59   0:02 postgres: archiver process   failed on 0000000200000E1B000000A9

The file mentioned is the one that it was about to archive, when the standby server failed. Somehow it must still be trying to "catch up" from that file which of cause isn't there any more, since I had to remove those in order to get more space on the HDD. Instead of trying to catch up from the last succeeded file, I want it to start over from scratch with the replication - I just don't know how.


That is because you manually removed some xlog, and you shouldn't ever do that. To "cancel" the archiving, the better way (IMHO) is to set archive_command to a dummy command, like:

    archive_command = '/bin/true'

And reload PostgreSQL:

    psql -c "SELECT pg_reload_conf()"

With that, PostgreSQL will stop archiving, and so you'll **be with no backup at all**. With some archives removed, you can use your old archive_command again and reload the server.

BTW, check why the archive_command is not working properly (look at PG's log files). Is it because of no space left on disk? If so, removing some may work.

Regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux