Hi Lonni,
Yes, I am using PG 9.1.9.
Yes, 1 slave syncing from the master
CentOS 6.4
I don't see any network or hardware issues (e.g. NIC) but will look more into this. They are communicating on a private network and switch.
I forgot to mention that after I restart the slave, everything syncs right back up and all if working again so if it is a network issue, the replication is just stopping after some hiccup instead of retrying and resuming when things are back up.
Thanks!
On Thu, Aug 15, 2013 at 11:32 AM, Lonni J Friedman <netllama@xxxxxxxxx> wrote:
I've never seen this happen. Looks like you might be using 9.1? Are
you up to date on all the 9.1.x releases?
Do you have just 1 slave syncing from the master?
Which OS are you using?
Did you verify that there aren't any network problems between the
slave & master?
Or hardware problems (like the NIC dying, or dropping packets)?
--
On Thu, Aug 15, 2013 at 11:07 AM, Andrew Berman <rexxe98@xxxxxxxxx> wrote:
> Hello,
>
> I'm having an issue where streaming replication just randomly stops working.
> I haven't been able to find anything in the logs which point to an issue,
> but the Postgres process shows a "waiting" status on the slave:
>
> postgres 5639 0.1 24.3 3428264 2970236 ? Ss Aug14 1:54 postgres:
> startup process recovering 000000010000053D0000003F waiting
> postgres 5642 0.0 21.4 3428356 2613252 ? Ss Aug14 0:30 postgres:
> writer process
> postgres 5659 0.0 0.0 177524 788 ? Ss Aug14 0:03 postgres:
> stats collector process
> postgres 7159 1.2 0.1 3451360 18352 ? Ss Aug14 17:31 postgres:
> wal receiver process streaming 549/216B3730
>
> The replication works great for days, but randomly seems to lock up and
> replication halts. I verified that the two databases were out of sync with
> a query on both of them. Has anyone experienced this issue before?
>
> Here are some relevant config settings:
>
> Master:
>
> wal_level = hot_standby
> checkpoint_segments = 32
> checkpoint_completion_target = 0.9
> archive_mode = on
> archive_command = 'rsync -a %p foo@foo:/var/lib/pgsql/9.1/wals/%f
> </dev/null'
> max_wal_senders = 2
> wal_keep_segments = 32
>
> Slave:
>
> wal_level = hot_standby
> checkpoint_segments = 32
> #checkpoint_completion_target = 0.5
> hot_standby = on
> max_standby_archive_delay = -1
> max_standby_streaming_delay = -1
> #wal_receiver_status_interval = 10s
> #hot_standby_feedback = off
>
> Thank you for any help you can provide!
>
> Andrew
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama@xxxxxxxxx
LlamaLand https://netllama.linux-sxs.org