Search Postgresql Archives

Re: Scriptable way to validate a pg_dump restore ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Mon, 29 Jan 2024, 22:52 Adrian Klaver, <adrian.klaver@xxxxxxxxxxx> wrote:
On 1/29/24 00:12, Laura Smith wrote:
> Hi
>
> Let's say I've got a scenario where I'm doing a pg_dump replication rather than online streaming, e.g. due to air-gap or whatever.
>
> Is there a scriptable way to validate the restore ?  e.g. using doing something clever with ctid or something to ensure both the schema and all its rows were restored to the same point at which the dump was taken ?

Assuming you are using pg_restore on a non-plain text dump file then
from pg_restore.c:

* pg_restore.c

*      pg_restore is an utility extracting postgres database definitions
  *      from a backup archive created by pg_dump using the archiver
  *      interface.
  *
  *      pg_restore will read the backup archive and
  *      dump out a script that reproduces
  *      the schema of the database in terms of
  *                user-defined types
  *                user-defined functions
  *                tables
  *                indexes
  *                aggregates
  *                operators
  *                ACL - grant/revoke
  *
  * the output script is SQL that is understood by PostgreSQL
  *
  * Basic process in a restore operation is:
  *
  *      Open the Archive and read the TOC.
  *      Set flags in TOC entries, and *maybe* reorder them.
  *      Generate script to stdout
  *      Exit

Then:

        pg_restore -l -f <output_file> <dump_file>

to get the TOC mentioned above. Walk through that to verify schema is
the same in the restored database.

This will not tell you whether all the data was transferred. You will
either have to trust from pg_dump.c:

  *      pg_dump will read the system catalogs in a database and dump out a
  *      script that reproduces the schema in terms of SQL that is
understood
  *      by PostgreSQL
  *
  *      Note that pg_dump runs in a transaction-snapshot mode transaction,
  *      so it sees a consistent snapshot of the database including system
  *      catalogs. However, it relies in part on various specialized backend
  *      functions like pg_get_indexdef(), and those things tend to look at
  *      the currently committed state.  So it is possible to get 'cache
  *      lookup failed' error if someone performs DDL changes while a
dump is
  *      happening. The window for this sort of thing is from the
acquisition
  *      of the transaction snapshot to getSchemaData() (when pg_dump
acquires
  *      AccessShareLock on every table it intends to dump). It isn't
very large,
  *      but it can happen.

Or come up with way to capture the state of the data at the time of dump
and then compare to restored database. Something like Ron posted.

Right, for me, state, not just record count is what I'm interested in (for the initial full table copy part of replication). So, given the explanation about the possible per-table window, is there some property of the table that could be used to confirm that a table has made it across? 

I guess there is such a thing since the following incremental syncing would presumably need it. I had hoped the LSN was this thing, but confirmation would be great. 

Thanks, Shaheed


>
> Thanks !
>
> Laura
>
>

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux