Search Postgresql Archives

Re: Experience and feedback on pg_restore --data-only

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 20 Mar 2025, Dimitrios Apostolou wrote:

Rationale:

When restoring a backup in an emergency situation, it's fine to run
pg_restore as superuser and get an exact replica of the dumped db.
AFAICT pg_restore (without --data-only) is optimised for such case.

But pg_dump/restore can be used as a generic data-copying utility, and in
those cases it makes often sense to get rid of the churn and create a clean
database by running the SQL schema definition from version control, and then
copy the data for only the tables created.

For this case, I choose to run pg_restore --data-only, and run it as the user
who owns the database (dbowner), not as a superuser, in order to avoid
changes being introduced under the radar.
Another important reason I use --data-only: tablespaces. The new host has
different storage configuration and tablespaces and the schema has been
updated to reflect those. Therefore I must create the database using the
updated schema definitions on the new host.

Things that made my life hard:

* plenty of permission denials for both ALTER OWNER or SET SESSION
  AUTHORIZATION (depending on command line switches).  Both of these
  require superuser privilege, but in my case this is not really needed.
  Dbowner has CREATEROLE and is the one who creates all the roles (WITH
  SET TRUE), and their private schemata in the specific database.  Things
  would work if pg_restore did "SET ROLE" instead of "SET SESSION
  AUTHORIZATION" to switch user. Is this a straightforward change or there
  are issues I don't see?

* After each failed attempt, I need to issue a TRUNCATE table1,table2,...
  before I try again.  I wrote my own function for that. It would help if
  pg_restore would optionally truncate before COPY.  I believe it would
  require superuser privilege for it, that could achieve using the
  --superuser=username option used today for disabling the triggers.

Performance issues: (important as my db size is >5TB)

* WAL writes: I didn't manage to avoid writing to the WAL, despite having
  setting wal_level=minimal. I even wrote my own function to ALTER all
  tables to UNLOGGED, but failed with "could not change table T to
  unlogged because it references logged table".  I'm out of ideas on this
  one.

* Indices: Could pg_restore have a switch to DROP indices before each
  COPY, and re-CREATE them after, exactly as they were?  This would speed
  up the process quite a bit.


Any feedback for improving my process? Should I put these ideas somewhere as
ideas for improvement on pg_restore?

Thank you in advance,
Dimitris







[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux