On Mon, Jan 29, 2024 at 3:12 AM Laura Smith <n5d9xq3ti233xiyif2vp@xxxxxxxxxxxxx> wrote:
Hi
Let's say I've got a scenario where I'm doing a pg_dump replication rather than online streaming, e.g. due to air-gap or whatever.
Is there a scriptable way to validate the restore ? e.g. using doing something clever with ctid or something to ensure both the schema and all its rows were restored to the same point at which the dump was taken ?
1. Since pg_dump creates a logical copy, nothing binary like ctid will work.
2. pg_dump is designed to take a snapshot.If it doesn't, it would fail for people who do logical replication. However, no one has noticed.
That's kinda like being asked to prove that rocks always fall when you drop them. Either you trust physics, because physics has always worked, or you must watch every rock, because next time it might not fall. The analogy is slightly flawed, since we always check the pg_dump and pg_restore return codes, since something else might impact their function.
But if you still need evidence, here's what I'm doing to verify table and record counts during a 9.6 -> 14 migration. You'll have to modify it for your purpose.
Create this table and function beforehand:
CREATE TABLE dba.migration_table_counts (
location text check (location in ('96', '14'))
, table_name text
, row_count bigint
, count_time timestamp without time zone default current_timestamp
, primary key (table_name, location)
);
location text check (location in ('96', '14'))
, table_name text
, row_count bigint
, count_time timestamp without time zone default current_timestamp
, primary key (table_name, location)
);
CREATE OR REPLACE FUNCTION dba.get_table_counts(_p_source TEXT) RETURNS INTEGER
LANGUAGE plpgsql
AS $func$
DECLARE
r RECORD;
_sql TEXT;
_table_count BIGINT;
BEGIN
FOR r IN select relnamespace::regnamespace::text||'.'||relname as table_name
from pg_class cla
where relkind = 'r'
and not exists (select 1 -- excludes parent tables
from pg_inherits inh1
where inh1.inhparent = cla.oid)
and relnamespace::regnamespace::text
not in ('pg_catalog', 'information_schema', 'dba')
order by 1
LOOP
_sql := FORMAT('SELECT COUNT(*) FROM %s;', r.table_name);
RAISE NOTICE '% %', to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI:SS.MS'), _sql;
EXECUTE _sql INTO _table_count;
--RAISE NOTICE '%', _table_count;
INSERT INTO dba.migration_table_counts (location, table_name, row_count)
VALUES (_p_source, r.table_name, _table_count);
END LOOP;
RAISE NOTICE '% %', to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI:SS.MS'), 'Finished';
RETURN 0;
END
$func$;
LANGUAGE plpgsql
AS $func$
DECLARE
r RECORD;
_sql TEXT;
_table_count BIGINT;
BEGIN
FOR r IN select relnamespace::regnamespace::text||'.'||relname as table_name
from pg_class cla
where relkind = 'r'
and not exists (select 1 -- excludes parent tables
from pg_inherits inh1
where inh1.inhparent = cla.oid)
and relnamespace::regnamespace::text
not in ('pg_catalog', 'information_schema', 'dba')
order by 1
LOOP
_sql := FORMAT('SELECT COUNT(*) FROM %s;', r.table_name);
RAISE NOTICE '% %', to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI:SS.MS'), _sql;
EXECUTE _sql INTO _table_count;
--RAISE NOTICE '%', _table_count;
INSERT INTO dba.migration_table_counts (location, table_name, row_count)
VALUES (_p_source, r.table_name, _table_count);
END LOOP;
RAISE NOTICE '% %', to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI:SS.MS'), 'Finished';
RETURN 0;
END
$func$;
Run this script in a cron job that executes at the same time as your pg_dump cron job. Parameters should be for the source database.
#!/bin/bash
declare -gr Server=$1
declare -gr DB=$2
declare -gr Source=$3
if [ -z $Server ]; then echo "Requires a server name."; exit 2; fi
if [ -z $DB ]; then echo "Requires a DB name."; exit 3; fi
if [ -z $Source ]; then echo "Requires a source: 96|14."; exit 4; fi
psql -U postgres -h ${Server} $DB -Xac "DELETE FROM dba.migration_table_counts WHERE location = '$Source';"
psql -U postgres -h ${Server} $DB -Xac "select * from dba.get_table_counts('$Source');"
declare -gr Server=$1
declare -gr DB=$2
declare -gr Source=$3
if [ -z $Server ]; then echo "Requires a server name."; exit 2; fi
if [ -z $DB ]; then echo "Requires a DB name."; exit 3; fi
if [ -z $Source ]; then echo "Requires a source: 96|14."; exit 4; fi
psql -U postgres -h ${Server} $DB -Xac "DELETE FROM dba.migration_table_counts WHERE location = '$Source';"
psql -U postgres -h ${Server} $DB -Xac "select * from dba.get_table_counts('$Source');"
Run the same script on the destination server after the pg_restore is finished.
Dump the source dba.migration_table_counts then load it into the destination dba.migration_table_counts.
These two queries run on the destination server will check that all tables exist in both databases, and that the record counts are the same.
You'll have some slight variations, since the two jobs are in separate transactions. (Mine won't, since the applications will be shut down, and pg_hba.conf will block them.)
declare -gr SQL1="
with
c96 as (select table_name, row_count from dba.migration_table_counts where location = '96'),
c14 as (select table_name, row_count from dba.migration_table_counts where location = '14')
select c96.*, c14.*
from c96 full join c14 on c96.table_name = c14.table_name
where c96.table_name is null
or c14.table_name is null
order by c96.table_name, c14.table_name;"
psql $DB -ac "$SQL1"
declare -gr SQL2="
with
c96 as (select table_name, row_count from dba.migration_table_counts where location = '96'),
c14 as (select table_name, row_count from dba.migration_table_counts where location = '14')
select c96.*, c14.*, c96.row_count - c14.row_count as row_diff
from c96 inner join c14 on c96.table_name = c14.table_name
where c96.row_count != c14.row_count
order by c96.table_name;"
psql $DB -ac "$SQL2"
with
c96 as (select table_name, row_count from dba.migration_table_counts where location = '96'),
c14 as (select table_name, row_count from dba.migration_table_counts where location = '14')
select c96.*, c14.*
from c96 full join c14 on c96.table_name = c14.table_name
where c96.table_name is null
or c14.table_name is null
order by c96.table_name, c14.table_name;"
psql $DB -ac "$SQL1"
declare -gr SQL2="
with
c96 as (select table_name, row_count from dba.migration_table_counts where location = '96'),
c14 as (select table_name, row_count from dba.migration_table_counts where location = '14')
select c96.*, c14.*, c96.row_count - c14.row_count as row_diff
from c96 inner join c14 on c96.table_name = c14.table_name
where c96.row_count != c14.row_count
order by c96.table_name;"
psql $DB -ac "$SQL2"