Re: Replaying 48 WAL files takes 80 minutes

"Albe Laurenz" <laurenz.albe@xxxxxxxxxx> · Tue, 30 Oct 2012 09:50:44 +0100

>> On Mon, Oct 29, 2012 at 6:05 AM, Albe Laurenz
<laurenz.albe@xxxxxxxxxx> wrote:
>>> I am configuring streaming replication with hot standby
>>> with PostgreSQL 9.1.3 on RHEL 6 (kernel 2.6.32-220.el6.x86_64).
>>> PostgreSQL was compiled from source.
>>>
>>> It works fine, except that starting the standby took for ever:
>>> it took the system more than 80 minutes to replay 48 WAL files
>>> and connect to the primary.
>>>
>>> Can anybody think of an explanation why it takes that long?

Jeff Janes wrote:
>> Could the slow log files be replaying into randomly scattered pages
>> which are not yet in RAM?
>>
>> Do you have sar or vmstat reports?

The sar reports from the time in question tell me that I read
about 350 MB/s and wrote less than 0.2 MB/s.  The disks were
fairly busy (around 90%).

Jeff Trout wrote:
> If you do not have good random io performance log replay is nearly
unbearable.
> 
> also, what io scheduler are you using? if it is cfq change that to
deadline or noop.
> that can make a huge difference.

We use the noop scheduler.
As I said, an identical system performed well in load tests.

The sar reports give credit to Jeff Janes' theory.
Why does WAL replay read much more than it writes?
I thought that pretty much every block read during WAL
replay would also get dirtied and hence written out.

I wonder why the performance is good in the first few seconds.
Why should exactly the pages that I need in the beginning
happen to be in cache?

And finally: are the numbers I observe (replay 48 files in 80
minutes) ok or is this terribly slow as it seems to me?

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance