On Thu, 20 Mar 2025, Dimitrios Apostolou wrote:
Rationale: When restoring a backup in an emergency situation, it's fine to run pg_restore as superuser and get an exact replica of the dumped db. AFAICT pg_restore (without --data-only) is optimised for such case. But pg_dump/restore can be used as a generic data-copying utility, and in those cases it makes often sense to get rid of the churn and create a clean database by running the SQL schema definition from version control, and then copy the data for only the tables created. For this case, I choose to run pg_restore --data-only, and run it as the user who owns the database (dbowner), not as a superuser, in order to avoid changes being introduced under the radar.
Another important reason I use --data-only: tablespaces. The new host has different storage configuration and tablespaces and the schema has been updated to reflect those. Therefore I must create the database using the updated schema definitions on the new host.
Things that made my life hard: * plenty of permission denials for both ALTER OWNER or SET SESSION AUTHORIZATION (depending on command line switches). Both of these require superuser privilege, but in my case this is not really needed. Dbowner has CREATEROLE and is the one who creates all the roles (WITH SET TRUE), and their private schemata in the specific database. Things would work if pg_restore did "SET ROLE" instead of "SET SESSION AUTHORIZATION" to switch user. Is this a straightforward change or there are issues I don't see? * After each failed attempt, I need to issue a TRUNCATE table1,table2,... before I try again. I wrote my own function for that. It would help if pg_restore would optionally truncate before COPY. I believe it would require superuser privilege for it, that could achieve using the --superuser=username option used today for disabling the triggers. Performance issues: (important as my db size is >5TB) * WAL writes: I didn't manage to avoid writing to the WAL, despite having setting wal_level=minimal. I even wrote my own function to ALTER all tables to UNLOGGED, but failed with "could not change table T to unlogged because it references logged table". I'm out of ideas on this one. * Indices: Could pg_restore have a switch to DROP indices before each COPY, and re-CREATE them after, exactly as they were? This would speed up the process quite a bit. Any feedback for improving my process? Should I put these ideas somewhere as ideas for improvement on pg_restore? Thank you in advance, Dimitris