Greetings, * Haroldo Kerry (hkerry@xxxxxxxxxxxxx) wrote: > The bottleneck at dump is CPU (a single one, on a 44 thread server), as we > are using the -Fc option, that does not allow multiple jobs. > We tried some time ago to use the --jobs option of pg_dump but it was > slower, even with more threads. Our guess is the sheer volume of files > outweighs the processing gains of using a compressed file output. Also > pg_dump even with multiple jobs spends a lot of time (1h+) on the "reading > dependency data" section that seems to be single threaded (our application > is multi-tenant and we use schemas to separate tenant data, hence we have a > lot of tables). You might want to reconsider using the separate-schemas-for-tenants approach. This isn't the only annoyance you can run into with lots and lots of tables. That said, are you using the newer version of pg_dump (which is what you should be doing when migrating to a newer version of PG, always)? We've improved it over time, though I can't recall off-hand if this particular issue was improved of in-between the releases being discussed here. Of course, lots of little files and dealing with them could drag down performance when working in parallel. Still a bit surprised that it's ending up slower than -Fc. > We are creating the replica using : > docker exec pg-2 pg_basebackup -h 192.168.0.107 -U replication -P --xlog -D > /var/lib/postgresql/data_9.6 > and it is taking 1h10m , instead of the 2h I reported initially, because we > were using rsync with checksums to do it, after experimenting with > pg_basebackup we found out it is faster, rsync was taking 1h just to > calculate all checksums. Thanks for your insight on this taking too long. So, it's a bit awkward still, unfortunately, but you can use pgbackrest to effectively give you a parallel-replica-build. The steps are something like: Get pgbackrest WAL archiving up and going, with the repo on the destination server/filesystem, but have 'compress=n' in the pgbackrest.conf for the repo. Run: pgbackrest --stanza=mydb --type=full --process-max=8 backup Once that's done, just do: mv /path/to/repo/backup/mydb/20190605-120000F/pg_data /new/pgdata chmod -R g-rwx /new/pgdata Then in /new/pgdata, create a recovery.conf file like: restore_command = 'pgbackrest --stanza=mydb archive-get %f "%p"' And start up the DB server. We have some ideas about how make that whole thing cleaner but the rewrite into C has delayed our efforts, perhaps once that's done (this fall), we can look at it. Of course, you won't have an actual backup of the new database server at that point yet, so you'll want to clean things up and make that happen ASAP. Another option, which is what I usually recommend, is just to take a new backup (properly) and then do a restore from it, but that'll obviously take longer since there's two copies being done instead of one (though you can parallelize to your heart's content, so it can still be quite fast if you have enough CPU and I/O). Thanks, Stephen
Attachment:
signature.asc
Description: PGP signature