Search Postgresql Archives

Re: replication timeout in pg_basebackup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Hari Babu.

I think what is happening is that my dirty cache builds up quickly for the volume where I am backing up. This would trigger flush of these dirty pages to the disk. While this flush is going on pg_basebackup tries to do fsync() on a received WAL file and gets blocked. 

While in this state, i.e. when dirty page count is high, following are the results of pg_test_fsync


# /usr/pgsql-9.2/bin/pg_test_fsync -f /backup/fsync_test
2 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                      16.854 ops/sec
        fdatasync                          15.242 ops/sec
        fsync                               0.187 ops/sec
        fsync_writethrough                            n/a
        open_sync                          14.747 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                       6.137 ops/sec
        fdatasync                          14.899 ops/sec
        fsync                               0.007 ops/sec
        fsync_writethrough                            n/a
        open_sync                           1.450 ops/sec

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
         1 * 16kB open_sync write          13.486 ops/sec
         2 *  8kB open_sync writes          6.006 ops/sec
         4 *  4kB open_sync writes          3.446 ops/sec
         8 *  2kB open_sync writes          1.400 ops/sec
        16 *  1kB open_sync writes          0.859 ops/sec

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
        write, fsync, close                 0.009 ops/sec
        write, close, fsync                 0.008 ops/sec

Non-Sync'ed 8kB writes:
        write                           99415.368 ops/sec


However when backups are not going on and dirty pages count is low, below are the results of this test

# /usr/pgsql-9.2/bin/pg_test_fsync -f /backup/fsync_test
2 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                    1974.243 ops/sec
        fdatasync                        1410.804 ops/sec
        fsync                             181.129 ops/sec
        fsync_writethrough                            n/a
        open_sync                         547.389 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                     290.109 ops/sec
        fdatasync                         962.378 ops/sec
        fsync                             158.987 ops/sec
        fsync_writethrough                            n/a
        open_sync                         642.309 ops/sec

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
         1 * 16kB open_sync write        1014.456 ops/sec
         2 *  8kB open_sync writes        627.964 ops/sec
         4 *  4kB open_sync writes        340.313 ops/sec
         8 *  2kB open_sync writes        173.581 ops/sec
        16 *  1kB open_sync writes        103.236 ops/sec

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
        write, fsync, close               244.670 ops/sec
        write, close, fsync               207.248 ops/sec

Non-Sync'ed 8kB writes:
        write                           202216.900 ops/sec



From: Haribabu Kommi [kommi.haribabu@xxxxxxxxx]
Sent: Monday, March 10, 2014 1:42 AM
To: Aggarwal, Ajay
Cc: pgsql-general@xxxxxxxxxxxxxx
Subject: Re: replication timeout in pg_basebackup

On Mon, Mar 10, 2014 at 12:52 PM, Aggarwal, Ajay <aaggarwal@xxxxxxxxxxx> wrote:
Our environment: Postgres version 9.2.2 running on CentOS 6.4

Our backups using pg_basebackup are frequently failing with following error
"pg_basebackup: could not send feedback packet: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request."

We are invoking pg_basebackup with these arguments : pg_basebackup -D backup_dir -X stream -l backup_dir
In postgres logs we see this log message "terminating walsender process due to replication timeout".

Our replication timeout is default 60 seconds. If we increase the replication time to say 180 seconds, we see better results but backups still fail occasionally.

Running strace on pg_basebackup process, we see that the fsync() call takes significant time and could be responsible for causing this timeout in postgres.

Use the pg_test_fsync utility which is available in postgresql contrib module to test your system sync methods performance. 
 
Has anybody else run into the same issue? Is there a way to run pg_basebackup without fsync() ?

As of now there is no such options available, I feel it is better to find why the sync is taking time?
 
Regards,
Hari Babu
Fujitsu Australia

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux