Re: Streaming Replication Randomly Locking Up

Andrew Berman <rexxe98@xxxxxxxxx> · Thu, 15 Aug 2013 11:45:42 -0700

Hi Lonni,
Yes, I am using PG 9.1.9.
Yes, 1 slave syncing from the master
CentOS 6.4
I don't see any network or hardware issues (e.g. NIC) but will look more into this.  They are communicating on a private network and switch.

I forgot to mention that after I restart the slave, everything syncs right back up and all if working again so if it is a network issue, the replication is just stopping after some hiccup instead of retrying and resuming when things are back up.

Thanks!

On Thu, Aug 15, 2013 at 11:32 AM, Lonni J Friedman <netllama@xxxxxxxxx> wrote:

I've never seen this happen.  Looks like you might be using 9.1?  Are

you up to date on all the 9.1.x releases?

Do you have just 1 slave syncing from the master?

Which OS are you using?

Did you verify that there aren't any network problems between the

slave & master?

Or hardware problems (like the NIC dying, or dropping packets)?

On Thu, Aug 15, 2013 at 11:07 AM, Andrew Berman <rexxe98@xxxxxxxxx> wrote:

> Hello,

>

> I'm having an issue where streaming replication just randomly stops working.

> I haven't been able to find anything in the logs which point to an issue,

> but the Postgres process shows a "waiting" status on the slave:

>

> postgres  5639  0.1 24.3 3428264 2970236 ?     Ss   Aug14   1:54 postgres:

> startup process   recovering 000000010000053D0000003F waiting

> postgres  5642  0.0 21.4 3428356 2613252 ?     Ss   Aug14   0:30 postgres:

> writer process

> postgres  5659  0.0  0.0 177524   788 ?        Ss   Aug14   0:03 postgres:

> stats collector process

> postgres  7159  1.2  0.1 3451360 18352 ?       Ss   Aug14  17:31 postgres:

> wal receiver process   streaming 549/216B3730

>

> The replication works great for days, but randomly seems to lock up and

> replication halts.  I verified that the two databases were out of sync with

> a query on both of them.  Has anyone experienced this issue before?

>

> Here are some relevant config settings:

>

> Master:

>

> wal_level = hot_standby

> checkpoint_segments = 32

> checkpoint_completion_target = 0.9

> archive_mode = on

> archive_command = 'rsync -a %p foo@foo:/var/lib/pgsql/9.1/wals/%f

> </dev/null'

> max_wal_senders = 2

> wal_keep_segments = 32

>

> Slave:

>

> wal_level = hot_standby

> checkpoint_segments = 32

> #checkpoint_completion_target = 0.5

> hot_standby = on

> max_standby_archive_delay = -1

> max_standby_streaming_delay = -1

> #wal_receiver_status_interval = 10s

> #hot_standby_feedback = off

>

> Thank you for any help you can provide!

>

> Andrew

>

--

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

L. Friedman                                    netllama@xxxxxxxxx

LlamaLand                       https://netllama.linux-sxs.org