Re: Backup Copy of a Production server.

Chris Travers <chris.travers@xxxxxxxxx> · Mon, 7 Aug 2023 19:30:20 +0700

On Mon, Aug 7, 2023 at 7:00 PM KK CHN <kkchn.in@xxxxxxxxx> wrote:

On Mon, Aug 7, 2023 at 10:49 AM Ron <ronljohnsonjr@xxxxxxxxx> wrote:

    On 8/7/23 00:02, KK CHN wrote:

        List,

        I am in need to copy a production PostgreSQL server  data(
          1 TB)  to  an external storage( Say USB Hard Drive) and need
          to set up a backup server with this data dir. 

        What is the trivial method to achieve this ?? 

pg_basebackup backs up the data dir. 

Alternatively you could use another tool like pgbackrest which also has some nice features regarding incremental backups, wal management etc and parallel backup and restore.

        1. Is Sqldump an option at a production server ?? (  Will
          this affect the server performance  and possible slowdown of
          the production server ? This server has a high IOPS). This
          much size 1.2 TB will the Sqldump support ? Any bottlenecks ?

A sql dump can be had with pg_dumpall, but this is a different backup strategy.  I would recommend in that case pg_dumpall -g to dump only globals (roles and tablespaces) and then pg_dump on the databases individually with format selection to either custom (if parallelism is not required) or tar (if it is).  See the pg_dump man page for details. 

The file size will depend on file format selected etc.  My naive guess for custom format would be maybe 200-400GB.  For tar format probably more (double or more) but total size depends on many factors and cannot be reliably estimated.  In rare cases, it could even be larger than your data directory.

    Whether or not there will be bottlenecks depends on how busy (CPU
    and disk load) the current server is.

pg_basebackup is limited  by the fact that it is single threaded on both sides (aside from wal), and this also limits disk I/O as well as network throughput (if you have a long fat pipe).

pg_dump is also limited by having to interpret and serialize the output, and also, if you have large text or binary fields, having to retrieve these one at a time.  Additionally you could have lock contention.

        2. Is copying the data directory from the production server
          to an external storage and replace the data dir  at a  backup
          server with same postgres version and replace it's data
          directory with this data dir copy is a viable option ?

There is a lot of complexity to doing that right.  If you want to do that, look at using pgbackrest. 

        # cp  -r   ./data  /media/mydb_backup  ( Does this affect
          the Production database server performance ??)   due to the
          copy command overhead ?

        OR  doing a WAL Replication Configuration to a standby is
          the right method to achieve this ??

That is often also used, but you need to define what you want out of a backup.  A standby will protect you from hardware failure for the most part.  It will not, without a lot of other thought and configuration, protect you from an administrator accidently dropping an important table or database.   WAL archiving and backups can help there though (and hence my recommendation for pgbackrest, which can also restore the data directory and/or wals to a standby). 

    But you say you can't establish a network connection outside the DC.  ( I can't do for a remote machine .. But I can do  a WAL replication to another host in the same network inside the DC. So that If I  do a sqldump  or Copy of Data dir of the standby server it won't affect the production server, is this sounds good  ?  )

With a good backup archive accessible from only the places it needs to be accessed, this problem goes away. 

         This is to take out the database backup outside the
          Datacenter and our DC policy won't allow us to establish a
          network connection outside the DC to a remote location for WAL
          replication . 

    If you're unsure of what Linux distro & version and Postgresql
    version that you'll be restoring the database to, then the solution
    is: 

    DB=the_database_you_want_to_backup

      THREADS=<some_number>

      cd $PGDATA 

      cp -v pg_hba.conf postgresql.conf /media/mydb_backup

      cd /media/mydb_backup

      pg_dumpall --globals-only > globals.sql

What is the relevance of  globals-only and  what this will do  ${DB}.log  // or is it  ${DB}.sql  ?    

      pg_dump --format=d --verbose --jobs=$THREADS $DB &>
      ${DB}.log      // .log couldn't get an idea what it mean 

    If you're 100% positive that the system you might someday restore to
    is exactly the same distro & version, and Postgresql
    major version, then I'd use PgBackRest.

    -- 

      Born in Arizona, moved to Babylonia.

-- 
Best Wishes,Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.
http://www.efficito.com/learn_more