On Feb 28, 2019, at 8:04 AM, Stephen Frost <sfrost@xxxxxxxxxxx> wrote: > > Greetings, > > * Julie Nishimura (juliezain@xxxxxxxxxxx) wrote: >> Hello everybody, I am new to postgresql environment, but trying to get up to speed. >> Can you please share your experience on how you can automate refreshment of dev environment on regular basis (desirably weekly), taking for consideration some of prod dbs can be very large (like 20+ TB >> >> Any suggestions? > > The approach that I like to recommend is to have your backup/restore > solution be involved in this refreshing process, so that you're also > testing that your backup/restore process works correctly. For dealing > with larger databases, using a backup tool which has parallel backup, > parallel restore, and is able to restore just the files which are > different from the backup can make the restore take much less time (this > is what the 'delta-restore' option in pgbackrest does, and it was > specifically written to support exactly this kind of prod->dev periodic > refresh, though other tools may also support that these days). > > As mentioned elsewhere on this thread, using snapshots can also be a > good approach though you have to be sure that the snapshot is completely > atomic across all filesystems that PostgreSQL is using, or you have to > deal with running pg_start/stop_backup and putting a backup_label into > place for the restored snapshot and a recovery.conf to provide a way for > PG to get at any WAL which was generated while the snapshot (or > snapshots) was being taken. Very much yes to everything Stephen says. Regularly refreshing nonprod via your normal backup/restore process is an efficient way to test your backups, and snapshots are a great way to do backups when your data volume is greater than your churn between backups. (And at 20+ TB, I hope that's the case for you.)