On Thu, Jan 3, 2019 at 3:46 PM Stephen Frost <sfrost@xxxxxxxxxxx> wrote:
Greetings Chuck,
* Chuck Martin (clmartin@xxxxxxxxxxxxxxxx) wrote:
> Using iperf, the transfer speed between the two servers (from the main to
> the standby) was 938 Mbits/sec. If I understand the units correctly, it is
> close to what it can be.
That does look like the rate it should be going at, but it should only
take about 2 hours to copy 750GB at that rate.
That’s what I was expecting.
How much WAL does this system generate though...? If you're generating
a very large amount then it's possible the WAL streaming is actually
clogging up the network and causing the rate of copy on the data files
to be quite slow. You'd have to be generating quite a bit of WAL
though.
It shouldn’t be excessive, but I’ll look closely at that.
> Your earlier suggestion was to do the pg_basebackup locally and rsync it
> over. Maybe that would be faster. At this point, it is saying it is 6%
> through, over 24 hours after being started.
For building out a replica, I'd tend to use my backups anyway instead of
using pg_basebackup. Provided you have good backups and reasonable WAL
retention, restoring a backup and then letting it replay WAL from the
archive until it can catch up with the primary works very well. If you
have a very high rate of WAL then you might consider taking a full
backup and then taking an incremental backup (which is much faster, and
reduces the amount of WAL required to be only that needed for the length
of time that the incremental backup is started until the replica has
caught up to WAL that the primary has).
There's a few different backup tools out there which can do parallel
backup and in-transit compression, which loads up the primary's CPUs
with process doing compression but should reduce the overall time if the
bottleneck is the network.
I’ll check out some solutions this weekend.
I appreciate the tips.
Chuck
Thanks!
Stephen
Chuck Martin
Avondale Software