Thanks Hari Babu.
I think what is happening is that my dirty cache builds up quickly for the volume where I am backing up. This would trigger flush of these dirty pages to the disk. While this flush is going on pg_basebackup tries to do fsync() on a received WAL file and gets blocked. While in this state, i.e. when dirty page count is high, following are the results of pg_test_fsync # /usr/pgsql-9.2/bin/pg_test_fsync -f /backup/fsync_test 2 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 16.854 ops/sec fdatasync 15.242 ops/sec fsync 0.187 ops/sec fsync_writethrough n/a open_sync 14.747 ops/sec Compare file sync methods using two 8kB writes: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 6.137 ops/sec fdatasync 14.899 ops/sec fsync 0.007 ops/sec fsync_writethrough n/a open_sync 1.450 ops/sec Compare open_sync with different write sizes: (This is designed to compare the cost of writing 16kB in different write open_sync sizes.) 1 * 16kB open_sync write 13.486 ops/sec 2 * 8kB open_sync writes 6.006 ops/sec 4 * 4kB open_sync writes 3.446 ops/sec 8 * 2kB open_sync writes 1.400 ops/sec 16 * 1kB open_sync writes 0.859 ops/sec Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) write, fsync, close 0.009 ops/sec write, close, fsync 0.008 ops/sec Non-Sync'ed 8kB writes: write 99415.368 ops/sec However when backups are not going on and dirty pages count is low, below are the results of this test # /usr/pgsql-9.2/bin/pg_test_fsync -f /backup/fsync_test 2 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 1974.243 ops/sec fdatasync 1410.804 ops/sec fsync 181.129 ops/sec fsync_writethrough n/a open_sync 547.389 ops/sec Compare file sync methods using two 8kB writes: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 290.109 ops/sec fdatasync 962.378 ops/sec fsync 158.987 ops/sec fsync_writethrough n/a open_sync 642.309 ops/sec Compare open_sync with different write sizes: (This is designed to compare the cost of writing 16kB in different write open_sync sizes.) 1 * 16kB open_sync write 1014.456 ops/sec 2 * 8kB open_sync writes 627.964 ops/sec 4 * 4kB open_sync writes 340.313 ops/sec 8 * 2kB open_sync writes 173.581 ops/sec 16 * 1kB open_sync writes 103.236 ops/sec Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) write, fsync, close 244.670 ops/sec write, close, fsync 207.248 ops/sec Non-Sync'ed 8kB writes: write 202216.900 ops/sec From: Haribabu Kommi [kommi.haribabu@xxxxxxxxx]
Sent: Monday, March 10, 2014 1:42 AM To: Aggarwal, Ajay Cc: pgsql-general@xxxxxxxxxxxxxx Subject: Re: replication timeout in pg_basebackup On Mon, Mar 10, 2014 at 12:52 PM, Aggarwal, Ajay
<aaggarwal@xxxxxxxxxxx> wrote:
Use the pg_test_fsync utility which is available in postgresql contrib module to test your system sync methods performance.
As of now there is no such options available, I feel it is better to find why the sync is taking time?
Regards,
Hari Babu
Fujitsu Australia
|