On 04/15/2016 03:28 PM, Andrej Vanek wrote:
Hello, I tried to run pg_basebackup. Return value is 1. How to find out its reason? (I suspect that some wal after backup is missing- but how to find out the real reason? How to fix it?)
First it is not clear to me where you are taking the backup from, the master or the standby?
Second there is a lot of redirection going on. What happens if you run the pg_basebackup directly (without doing su - postgres ...) and use hardcoded values instead of shell variables?
thanks, Andrej --------------details: environment: CentOS 6.7, postgres 9.5.1 ( PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit) I tried 2 forms of pg_basebackup (-X fetch and -X stream). Both were issued from a script: # su - postgres -c "/usr/pgsql-9.5/bin/pg_basebackup -h ${DB_MASTER_IP} -D ${GEO_STDBY_DATA} -U pgreplic -P -v -X fetch" 2>${LOG_FILE}.stderr >> ${LOG_FILE} # echo $? 1 <--------------pg_basebackup failed! # cat log.stderr # cat /var/log/cluster/geo_repair.log.err transaction log start point: 0/E3000028 on timeline 1 WARNING: skipping special file "./pg_hba.conf" WARNING: skipping special file "./pg_hba.conf.save" transaction log end point: 0/E30000F8 pg_basebackup: base backup completed <------------------no reason for pg_basebackup failure! # cp /tmp/pg_hba.conf /tmp/postgresql.conf /pg_data/ # su - postgres -c "/usr/pgsql-9.5/bin/pg_ctl -D /pg_data/ start" # tail /pg_data/pg_log/postgresql-Fri.log `pg_xlog/0000000100000000000000E2' -> `../backups/arc/0000000100000000000000E2' 2016-04-15 23:15:10 CEST:pgreplic@[unknown]:[10667] WARNING: skipping special file "./pg_hba.conf" 2016-04-15 23:15:10 CEST:pgreplic@[unknown]:[10667] WARNING: skipping special file "./pg_hba.conf.save" <---------------recorded in pg_log on master node and copied by pg_basebackup (note time difference between two servers) 2016-04-15 23:15:02 CEST:@:[23321] LOG: database system was interrupted; last known up at 2016-04-15 23:15:10 CEST 2016-04-15 23:15:02 CEST:postgres@postgres:[23329] FATAL: the database system is starting up 2016-04-15 23:15:03 CEST:@:[23321] LOG: entering standby mode 2016-04-15 23:15:03 CEST:@:[23321] LOG: database system was not properly shut down; automatic recovery in progress <---------something missing from pg_basebackup 2016-04-15 23:15:03 CEST:@:[23321] LOG: redo starts at 0/E3000028 2016-04-15 23:15:03 CEST:@:[23321] LOG: consistent recovery state reached at 0/E4000000 2016-04-15 23:15:03 CEST:@:[23295] LOG: database system is ready to accept read only connections 2016-04-15 23:15:03 CEST:@:[23356] LOG: started streaming WAL from primary at 0/E4000000 on timeline 1 -------second trial # su - postgres -c "/usr/pgsql-9.5/bin/pg_basebackup -h ${DB_MASTER_IP} -D ${GEO_STDBY_DATA} -U pgreplic -P -v -X stream" # echo $? 1 # cat /var/log/cluster/geo_repair.log.err transaction log start point: 0/E5000028 on timeline 1 pg_basebackup: starting background WAL receiver WARNING: skipping special file "./pg_hba.conf" WARNING: skipping special file "./pg_hba.conf.save" transaction log end point: 0/E50000F8 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: could not wait for child process: No child processes <----what does this mean? I think it failed to start process to fetching wal logs created during backup: but neither on master node neither on pg_basebackup output here is any information about reason.. (max_wal_senders on master is 10: I see no reason to fail). postgres logs: `pg_xlog/0000000100000000000000E4' -> `../backups/arc/0000000100000000000000E4' 2016-04-15 23:35:09 CEST:pgreplic@[unknown]:[29035] WARNING: skipping special file "./pg_hba.conf" 2016-04-15 23:35:09 CEST:pgreplic@[unknown]:[29035] WARNING: skipping special file "./pg_hba.conf.save" 2016-04-15 23:35:01 CEST:@:[28926] LOG: database system was interrupted; last known up at 2016-04-15 23:35:09 CEST 2016-04-15 23:35:01 CEST:postgres@postgres:[28938] FATAL: the database system is starting up 2016-04-15 23:35:02 CEST:@:[28926] LOG: entering standby mode 2016-04-15 23:35:02 CEST:@:[28926] LOG: database system was not properly shut down; automatic recovery in progress <------------this means something missing from pg_basebackup 2016-04-15 23:35:02 CEST:@:[28926] LOG: redo starts at 0/E5000028 2016-04-15 23:35:02 CEST:@:[28926] LOG: consistent recovery state reached at 0/E6000000 2016-04-15 23:35:02 CEST:@:[28904] LOG: database system is ready to accept read only connections 2016-04-15 23:35:02 CEST:@:[28989] LOG: started streaming WAL from primary at 0/E6000000 on timeline 1 postgres params on master node: log_line_prefix = '%t:%u@%d:[%p] ' logging_collector = on wal_buffers = 16MB max_wal_size = 200MB log_temp_files = 1MB max_connections = 170 shared_buffers = 512MB effective_cache_size = 1500MB work_mem = 48MB log_lock_waits = on log_min_duration_statement = 10000 shared_preload_libraries = 'pg_stat_statements' include '/var/lib/pgsql/tmp/rep_mode.conf' # added by pgsql RA wal_level = hot_standby archive_mode = on max_wal_senders = 10 hot_standby = on wal_keep_segments = 128 archive_command = '/opt/postgres/dbconf/archive_command.sh %p %f' wal_receiver_status_interval = 2 max_standby_streaming_delay = -1 max_standby_archive_delay = -1 restart_after_crash = off hot_standby_feedback = on
-- Adrian Klaver adrian.klaver@xxxxxxxxxxx -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general