Re: PgBackRest PTR recovery: After table drop to get dropped state

Kashif Zeeshan <kashi.zeeshan@xxxxxxxxx> · Thu, 1 Aug 2024 15:00:16 +0500

Hi

On Thu, Aug 1, 2024 at 2:54 PM KK CHN <kkchn.in@xxxxxxxxx> wrote:
List,

Not working (start EPAS server always fails): 

1. Testing PTR using  PgBackRest(2.52.1)  on RHEL9  EPAS-16, and RHEL9 ( Repo       Server) 

  When I do a PTR

1.  After doing a table drop and then
2. Noting down the time stamp and then 
3. Taking an incremental backup in hope that If I do a restore from this incr Backup, that won't  contain the  dropped table. 
4. Correct me  if I am  conceptually wrong here.  
5.  I am never successful in restoring the EPAS server in this scenario.

I know the following will work for me, why not the above one if I really want that state of cluster also  ? 

This is Working. 
 1. Create table 
2. Noting down the timestamp
3.  Taking incremental backup on RepoServer.
4. drop the created table .
5. Then stop the EPAS server and do a  PTR, by the  --set=step 3 incr backup  and target= step 2 time stamp .. It finished the pgaback restore and promote command
6. I am able to start back the  EPAS server and see the dropped table recovered there.

But If I want a PTR as in the first section it fails.. Why ? 

Thank you,
Krishane

What I have done and results obtained: 

Created a table important_table4 in my EPAS and note down the time after creation of this table it is  ( t1 :  "01-AUG-24 14:08:32.447796+05:30" )

Then I performed an Incremental backup   (incr backup: 20240729-160137F_20240801-141148I )
timestamp start/stop: 2024-08-01 14:11:48+05:30 / 2024-08-01 14:11:52+05:30

Now I dropped the table table4 from the EPAS and noted down the time 

I want to  restore the table4,, so I stopped EPAS and executed 

$ sudo -u enterprisedb pgbackrest --stanza=Demo_Repo --delta --set=20240729-160137F_20240801-141148I  --target-timeline=current --type=time  --target="01-AUG-24 14:08:32.447796+05:30" --target-action="" restore

IT WORKS AS EXPECTED .. after restarting the EPAS I am able to get the important_table4 back. 

root@service01 ~]# sudo -u enterprisedb psql edb
psql (16.3.0)
Type "help" for help.

edb=# \dt
                List of relations
 Schema |       Name       | Type  |    Owner
--------+------------------+-------+--------------
 public | important_table  | table | enterprisedb
 public | important_table2 | table | enterprisedb
 public | important_table3 | table | enterprisedb
 public | important_table4 | table | enterprisedb
(4 rows)

SO all works fine !!!! . 

But Now the PROBLEM Statement. 

1. I am dropping the table table 4 again 
edb=# \q
[root@service01 ~]# sudo -u enterprisedb psql -c "begin; drop table important_table4; commit;" edb
BEGIN
DROP TABLE
COMMIT
2 .  [root@service01 ~]# sudo -u enterprisedb psql -Atc "select current_timestamp" edb  01-AUG-24 14:23:22.085076 +05:30
Noting the time as :   (01-AUG-24 14:23:22.085076 +05:30 )

3. Now  I am performing an incremental backup after step 2  on REPO SErver ( Hoping that this latest INCR Backup is without dropped important_table4, so that a recovery of the cluster  shouldn't show the table4 again. ) 

        incr backup details. : 20240729-160137F_20240801-142433I
            timestamp start/stop: 2024-08-01 14:24:33+05:30 / 2024-08-01 14:24:36+05:30

4. Now I want to test the database recovery  after dropping the table4 in step1 to verify that my EPAS restores from the backup in step 3 and time stamp (01-AUG-24 14:23:22.085076 +05:30,   so that  the restored EPAS cluster doesn't contain the important_table4.

5.  $ sudo -u enterprisedb pgbackrest --stanza=Demo_Repo --delta   --set=20240729-160137F_20240801-142433I  --target-timeline=current --type=time  --target="01-AUG-24 14:23:22.085076+05:30" --target-action="" restore
 ------------
-------------
INFO: restore command end: completed successfully (1035ms)

ISSUE:    I am unable to get the EPAS Server in running state after step 5 

 What am I doing wrong ?  OR am I conceptually wrong ?

OUTPUT on executing step 5. 

[root@service01 ~]# sudo -u enterprisedb pgbackrest --stanza=Demo_Repo --delta --set=20240729-160137F_20240801-142433I  --target-timeline=current --type=time  --target="01-AUG-24 14:23:22.085076+05:30" --target-action="" restore

2024-08-01 14:30:03.535 P00   INFO: restore command begin 2.52.1: --delta --exec-id=82738-b5fe7415 --log-level-console=info --log-level-file=debug --pg1-path=/var/lib/edb/as16/data --pg-version-force=16 --repo1-host=10.10.20.7 --repo1-host-user=postgres --set=20240729-160137F_20240801-142433I --stanza=Demo_Repo --target="01-AUG-24 14:23:22.085076+05:30" --target-action="" --target-timeline=current --type=time
2024-08-01 14:30:03.880 P00   INFO: repo1: restore backup set 20240729-160137F_20240801-142433I, recovery will start at 2024-08-01 14:24:33
2024-08-01 14:30:03.881 P00   INFO: remove invalid files/links/paths from '/var/lib/edb/as16/data'
2024-08-01 14:30:04.567 P00   INFO: write updated /var/lib/edb/as16/data/postgresql.auto.conf
2024-08-01 14:30:04.569 P00   INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2024-08-01 14:30:04.569 P00   INFO: restore size = 75.9MB, file total = 2171
2024-08-01 14:30:04.569 P00   INFO: restore command end: completed successfully (1035ms)
[root@service01 ~]# systemctl  start edb-as-16.service

Now  If I check the server  status  :   Its dead 

[root@service01 ~]# systemctl  status edb-as-16.service
× edb-as-16.service - EDB Postgres Advanced Server 16
     Loaded: loaded (/etc/systemd/system/edb-as-16.service; disabled; preset: disabled)
     Active: failed (Result: exit-code) since Thu 2024-08-01 14:30:58 IST; 4s ago
   Duration: 228ms
    Process: 82752 ExecStartPre=/usr/edb/as16/bin/edb-as-16-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
    Process: 82757 ExecStart=/usr/edb/as16/bin/edb-postgres -D ${PGDATA} (code=exited, status=1/FAILURE)
   Main PID: 82757 (code=exited, status=1/FAILURE)
        CPU: 325ms

Aug 01 14:30:56 service01 systemd[1]: Starting EDB Postgres Advanced Server 16...
Aug 01 14:30:56 service01 edb-postgres[82757]: 2024-08-01 14:30:56 IST LOG:  redirecting log output to logging collector process
Aug 01 14:30:56 rservice01 edb-postgres[82757]: 2024-08-01 14:30:56 IST HINT:  Future log output will appear in directory "log".
Aug 01 14:30:58 service01 systemd[1]: Started EDB Postgres Advanced Server 16.
Aug 01 14:30:58 service01 systemd[1]: edb-as-16.service: Main process exited, code=exited, status=1/FAILURE
Aug 01 14:30:58 service01 systemd[1]: edb-as-16.service: Killing process 82758 (edb-postgres) with signal SIGKILL.
Aug 01 14:30:58 service01 systemd[1]: edb-as-16.service: Failed with result 'exit-code'.
[root@service01 ~]#

Any hints/guidance most welcome. 

Can you please share the DB Server log as it contains the exact error which is causing the server not to start.

Thanks