On Tue, May 30, 2017 at 9:14 PM, Ludovic Vaugeois-Pepin <ludovicvp@xxxxxxxxx> wrote:
I ran into the issue described below with 10.0 beta. The error I got is:
pg_basebackup: could not create temporary replication slot
"pg_basebackup_2194": ERROR: replication slot "pg_basebackup_2194"
already exists
A race condition? Or maybe I am doing something wrong.
Release:
Name : postgresql10-server
Version : 10.0
Release : beta1PGDG.rhel7
Test Type:
Functional testing of a pacemaker resource agent
(https://github.com/ulodciv/pgha )
Test Detail:
During context/environement setup, pg_basebackup is invoked (in
parallel) from multiple virtual machines. The backups are then started
as asynchronously replicated hot standbies.
Platform:
Centos 7.3
Installation Method:
yum -y install
https://download.postgresql.org/pub/repos/yum/testing/10/ redhat/rhel-7-x86_64/pgdg- redhat10-10-1.noarch.rpm
yum -y install postgresql10-server postgresql10-contrib
Platform Detail:
Test Procedure:
Have pg_basebackup run simultaneously on multiple hosts against
the same instance eg:
pg_basebackup -h test4 -p 5432 -D /var/lib/pgsql/10/data -U repl1 -Xs
Failure?
E deploylib.deployer_error.DeployerError:
postgres@test5: got exit status 1 for:
E pg_basebackup -h test4 -p 5432 -D
/var/lib/pgsql/10/data -U repl1 -Xs
E stderr: pg_basebackup: could not create temporary
replication slot "pg_basebackup_2194": ERROR: replication slot
"pg_basebackup_2194" already exists
E pg_basebackup: child process exited with error 1
E pg_basebackup: removing data directory "/var/lib/pgsql/10/data"
Test Results:
Comments:
This seems to be new with 10. I recently began testing the
pacemaker resource agent against PG 10. I never had (or noticed) this
failure with 9.6.1 and 9.6.2.
Hah, that's an interesting failure. In the name of the slot, the 2194 comes from the pid -- but it's the pid of pg_basebackup.
I assume you're not running the two pg_basebackup processes on the same machine? Is it predictable when this happens (meaning that the pid value is actually predictable), or do you have to run it a large numbe rof times before it happens?