Re: Shortest offline window on database migration

Haroldo Kerry <hkerry@xxxxxxxxxxxxx> · Sat, 1 Jun 2019 10:17:13 -0300

Jeff,
We are using the following command to dump the database:
docker exec pg-1 bash -c 'pg_dump -v -U postgres -Fc --file=/var/lib/postgresql/backup/sc2-ssd.bkp smartcenter2_prod' 2>> /var/log/sc2-bkp-ssd.log &
The bottleneck at dump is CPU (a single one, on a 44 thread server), as we are using the -Fc option, that does not allow multiple jobs.
We tried some time ago to use the --jobs option of pg_dump but it was slower, even with more threads. Our guess is the sheer volume of files outweighs the processing gains of using a compressed file output. Also pg_dump even with multiple jobs spends a lot of time (1h+) on the "reading dependency data" section that seems to be single threaded (our application is multi-tenant and we use schemas to separate tenant data, hence we have a lot of tables).
We are dumping the backup to the old array.

We are creating the replica using :
docker exec pg-2 pg_basebackup -h 192.168.0.107 -U replication -P --xlog -D /var/lib/postgresql/data_9.6 
and it is taking 1h10m , instead of the 2h I reported initially, because we were using rsync with checksums to do it, after experimenting with pg_basebackup we found out it is faster, rsync was taking 1h just to calculate all checksums. Thanks for your insight on this taking too long.

Regarding blowing the old array after the upgrade is complete:
Our plan keeps the old array data volume. We restore the backup to the new array with checksums, delete the old replica (space issues), and use the space for the new replica with checksums.
If the new array does not work as expected, we switch to the replica with the old array. If all things go wrong, we can just switch back to the old array data volume (without a replica for some time). 

I'm glad to report that we executed the plan over dawn (4h downtime in the end) and everything worked, the new array is performing as expected.
New array database volume:
4 x Intel 960GB SATA SSD D3-S4610 on a RAID 10 configuration, on a local PERC H730 Controller.
Specs:
Mfr part number: SSDSC2KG960G801
Form Factor: 2.5 inch
Latency: read - 36 µs; write - 37 µs
Random Read (100% span): 96, 000 IOPS
Random Write (100% span): 51, 000 IOPS

Old array database volume (now used for the replica server):
7 x Samsung 480GB SAS SSD PM-1633 on a Dell SC2020 Compellent iSCSI storage, with dual 1 Gbps interfaces (max bandwidth 2 Gbps)
Specs:  Sequential Read Up to 1,400 MB/s
Sequential Write Up to 930 MB/s
Random Read Up to 200,000 IOPS
Random Write Up to 37,000 IOPS 

Kind of surprisingly to us the local array outperforms the older (and more expensive) by more than 2x.

Thanks for the help.

Regards,
Haroldo Kerry

On Thu, May 30, 2019 at 10:42 PM Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Thu, May 30, 2019 at 11:08 AM Haroldo Kerry <hkerry@xxxxxxxxxxxxx> wrote:
Hello,
We are migrating our PostgreSQL 9.6.10 database (with streaming replication active) to a faster disk array.
We are using this opportunity to enable checksums, so we will have to do a full backup-restore.
The database size is about 500GB, it takes about 2h:30min for a full backup, and then about 1h to fully restore it with checksum enabled on the new array, plus 2h to recreate the replica on the old array.

As others have noticed, your "trick" won't work.  So back to basics.  Are you using the best degree of parallelization on each one of these tasks?  What is the bottleneck of each one (CPU, disk, network)? how are you creating the replica?  Can you share the actual command lines for each one?  It seems odd that the dump (which only needs to dump the index and constraint definitions) is so much slower than the restore (which actually needs to build those indexes and validate the constraints). Is that because the dump is happening from the old slow disk and restore a new fast ones?  Same with creating the replica, why is that slower than actually doing the restore?

It sounds like you are planning on blowing away the old master server on the old array as soon as the upgrade is complete, so you can re-use that space to build the new replica?  That doesn't seem very safe to me--what if during the rebuilding of the replica you run into a major problem and have to roll the whole thing back?  What will the old array which is holding the current replica server be doing in all of this?

Cheers,

Jeff

-- 
Haroldo Kerry
CTO/COO
Rua do Rócio, 220, 7° andar, conjunto 72
São Paulo – SP / CEP 04552-000
hkerry@xxxxxxxxxxxxx
www.callix.com.br