Search Postgresql Archives

Problem with 9.1 streaming replication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all.

While testing a replication setup with PostgreSQL 9.1.4, I'm having an error after promoting the slave to master : some file under the 'base' subdirectory could not be read, that only 0 bytes could be fetched (see the log extract at the end) Indeed the actual file size is 0. I believe that whatever configuration mistake I may have made, such a corruption should never happen, isn't it ?

That error is persistent accross the cluster restarts. Basically, the DB is corrupted, almost nothing works. The only option is to reconstruct it from a dump.

The replication itself works, I'm able to start it with pg_basebackup in both ways.

I thought for a while that the error happended because I hade made the mistake not to configure wal_keep_segments (didn't realize the default value was not small but actually zero). Is that realistic

I set it since the first attempts to a value that I believe to be generous (1024, that should mean 16 GB of WAL). After that, I had a succesful failover simulation.

But the error got back with the same fatal corruption symptoms yesterday. It seems to be correlated to the size of data being replicated. This time, that was right after a pg_restore. (dumps in custom format are around 50 MB).

The bandwith between the servers is quite sufficient : I witnessed up to 70 MB/s with rsync.

Promotion is done with Debian's pg_ctlcluster promote, which I believe to be like other Debian tools a wrapper to select the right cluster.
Application software starts after the promotion.

Any hint appreciated, thanks !

Precise version:  9.1.4-2~bpo60+1 from Debian squeeze-backports

Log extract (french locale, here):
2012-07-22 21:27:59 UTC LOG:  restauration termin?e de l'archive
2012-07-22 21:27:59 UTC LOG: le syst?me de bases de donn?es est pr?t pour accepter les connexions
2012-07-22 21:27:59 UTC LOG:  lancement du processus autovacuum
2012-07-22 21:30:19 UTC ERREUR: n'a pas pu lire le bloc 0 du fichier « base/142824/151268 » : a lu seulement 0 octets
        sur 8192


--
Georges Racinet
Anybox SAS, http://anybox.fr
Bureau: 09 53 53 72 97 Portable: 06 51 32 07 27
GPG: 0x33AB0A35, sur serveurs publics



--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux