postgres complete recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Postgres complete recovery.

Hello experts, 

i am working on postgres backup and recovery, with pg_basebackup utility.
for some reason i was not able to perform complete recovery, recovery is completed till the available wal log not not able to 
perform complete recovery  committed transaction on wal logs.

any one performed complete database recovery ? if yes, kindly share the procedure.

here is the test case i have done 

wal log setting on postgres

=====>>>  enable archive log mode for a postgres cluster

wal_level = replica                    # minimal, replica, or logical
archive_mode = off             # enables archiving; off, on, or always
archive_command = 'cp %p /pgdata/archive/5444/%f'           # command to use to archive a logfile segment

======>>>> verify archive log setting

[postgres@pgsql-lnx1 datadir_clone]$ psql -p 5444 postgres

postgres=# show archive_mode;
 archive_mode
--------------
 on
(1 row)

postgres=# show  archive_command
postgres-# ;
        archive_command
-------------------------------
 cp %p /pgdata/archive/5444/%f
(1 row)

postgres=# show wal_level;
 wal_level
-----------
 replica
(1 row)

===== >>> force manual swtich 
postgres=# select pg_switch_wal();
 pg_switch_wal
---------------
 0/13A7EBC8
(1 row)

postgres=# \! ls -l /pgdata/archive/5444/
total 32768
-rw-------. 1 postgres postgres 16777216 Mar  4 04:53 000000010000000000000013
-rw-------. 1 postgres postgres 16777216 Mar  4 04:54 000000010000000000000014
postgres=#

 to perform hot backup, with low level api
 
 
 1. login to postgres server with psql
 2. enable start backup (select pg_start_backup('hotbackup_20190304');
 3. use os command to copy data, tablespace, wal location to backup media
 4. stop backup (select pg_stop_backup('hotbackup_20190304');)
 
 
 pg_basebackup tool .
 
 makes binary copy of database cluster files 
 it will automatically put database in backup mode and ends once backup is completed.
 it uses stream protocol
 
 
 below parameter has to be st for pg_basebackup 
 
wal_level = replica                    # minimal, replica, or logical
archive_mode = off             # enables archiving; off, on, or always
archive_command = 'cp %p /pgdata/archive/5444/%f'           # command to use to archive a logfile segment
#max_wal_senders = 10           # max number of walsender processes

modify hba_file.conf
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     postgres                                     md5
host    replication     postgres             127.0.0.1/32            md5
host    replication     postgres             ::1/128                 md5
#Allow remote replication connections from localhost, by a user with the
host replication <username> <ip adderess>/32  md5


restart database 
pg_ctl -D $PWD stop 


run pg_Basebackup 

[postgres@pgsql-lnx1 online]$ pg_basebackup -h localhost -p 5444 -D /pgbkup/backup/fsbackup/online -X stream -v > /tmp/hotbackup.log 2>&1
Password:

note -X is used to copy latest wal segment (16mb) to self start datababse after restore and recovery

view the file /tmp/hotbackup.log file to see the backup information.
[postgres@pgsql-lnx1 online]$ cat /tmp/hotbackup.log
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/19000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: write-ahead log end point: 0/19000130
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
[postgres@pgsql-lnx1 online]$




[postgres@pgsql-lnx1 online]$ pwd
/pgbkup/backup/fsbackup/online
[postgres@pgsql-lnx1 online]$ cat backup_label
START WAL LOCATION: 0/19000028 (file 000000010000000000000019)
CHECKPOINT LOCATION: 0/19000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2019-04-04 05:44:46 CDT
LABEL: pg_basebackup base backup
[postgres@pgsql-lnx1 online]$


====== >>>>>perform recovery:


create table test and insert few rows

ltuser=> create table test (eno int);
CREATE TABLE

ltuser=> insert into test values (generate_series(1,10000));
INSERT 0 10000
ltuser=> select count(*) from test;
count
-------
10000
(1 row)

ltuser=> insert into test values (generate_series(1,10000));
INSERT 0 10000
ltuser=> select count(*) from test;
count
-------
20000
(1 row)

ltuser=>  insert into test values (generate_series(1,10000));
INSERT 0 10000
ltuser=>  insert into test values (generate_series(1,10000));
INSERT 0 10000
ltuser=> commit;
WARNING:  there is no transaction in progress
COMMIT
ltuser=> select count(*) from test;
count
-------
40000
(1 row)

ltuser=> \!  ls -lart /pgdata/archive/5444/
total 114700
drwxrwxr-x. 4 postgres postgres       30 Mar  4 04:41 ..
-rw-------. 1 postgres postgres 16777216 Mar  4 04:53 000000010000000000000013
-rw-------. 1 postgres postgres 16777216 Mar  4 04:54 000000010000000000000014
-rw-------. 1 postgres postgres 16777216 Mar  4 05:39 000000010000000000000015
-rw-------. 1 postgres postgres 16777216 Mar  4 05:43 000000010000000000000016
-rw-------. 1 postgres postgres      305 Mar  4 05:43 000000010000000000000017.00000028.backup
-rw-------. 1 postgres postgres 16777216 Mar  4 05:43 000000010000000000000017
-rw-------. 1 postgres postgres 16777216 Mar  4 05:44 000000010000000000000018
-rw-------. 1 postgres postgres      305 Mar  4 05:44 000000010000000000000019.00000028.backup
drwxrwxr-x. 2 postgres postgres     4096 Mar  4 05:44 .
-rw-------. 1 postgres postgres 16777216 Mar  4 05:44 000000010000000000000019
ltuser=> \! ls -lart /pgdata/datadir_clone/pg_wal
total 213004
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 00000001000000000000001B
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 00000001000000000000001C
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 00000001000000000000001D
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 00000001000000000000001E
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 00000001000000000000001F
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 000000010000000000000020
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 000000010000000000000021
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 000000010000000000000022
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 000000010000000000000024
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 000000010000000000000023
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 000000010000000000000025
drwx------. 19 postgres postgres     4096 Mar  4 05:39 ..
-rw-------.  1 postgres postgres 16777216 Mar  4 05:44 000000010000000000000019
-rw-------.  1 postgres postgres      305 Mar  4 05:44 000000010000000000000019.00000028.backup
drwx------.  3 postgres postgres     4096 Mar  4 05:59 .
drwx------.  2 postgres postgres       96 Mar  4 05:59 archive_status
-rw-------.  1 postgres postgres 16777216 Mar  4 06:24 00000001000000000000001A
ltuser=> insert into test values (generate_series(1,10000));
INSERT 0 10000
ltuser=> insert into test values (generate_series(1,10000));
INSERT 0 10000
ltuser=> insert into test values (generate_series(1,10000));
INSERT 0 10000
ltuser=> select count(*) from test;
count
-------
70000
(1 row)

ltuser=> \! ls -lart /pgdata/datadir_clone/pg_wal
total 213004
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 00000001000000000000001B
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 00000001000000000000001C
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 00000001000000000000001D
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 00000001000000000000001E
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 00000001000000000000001F
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 000000010000000000000020
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 000000010000000000000021
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 000000010000000000000022
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 000000010000000000000024
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 000000010000000000000023
-rw-------.  1 postgres postgres 16777216 Mar  4 03:08 000000010000000000000025
drwx------. 19 postgres postgres     4096 Mar  4 05:39 ..
-rw-------.  1 postgres postgres 16777216 Mar  4 05:44 000000010000000000000019
-rw-------.  1 postgres postgres      305 Mar  4 05:44 000000010000000000000019.00000028.backup
drwx------.  3 postgres postgres     4096 Mar  4 05:59 .
drwx------.  2 postgres postgres       96 Mar  4 05:59 archive_status
-rw-------.  1 postgres postgres 16777216 Mar  4 06:25 00000001000000000000001A  >>>>> file has latest table test table , log swith not completed so far
ltuser=> \!  ls -lart /pgdata/archive/5444/
total 114700
drwxrwxr-x. 4 postgres postgres       30 Mar  4 04:41 ..
-rw-------. 1 postgres postgres 16777216 Mar  4 04:53 000000010000000000000013
-rw-------. 1 postgres postgres 16777216 Mar  4 04:54 000000010000000000000014
-rw-------. 1 postgres postgres 16777216 Mar  4 05:39 000000010000000000000015
-rw-------. 1 postgres postgres 16777216 Mar  4 05:43 000000010000000000000016
-rw-------. 1 postgres postgres      305 Mar  4 05:43 000000010000000000000017.00000028.backup
-rw-------. 1 postgres postgres 16777216 Mar  4 05:43 000000010000000000000017
-rw-------. 1 postgres postgres 16777216 Mar  4 05:44 000000010000000000000018
-rw-------. 1 postgres postgres      305 Mar  4 05:44 000000010000000000000019.00000028.backup
drwxrwxr-x. 2 postgres postgres     4096 Mar  4 05:44 .
-rw-------. 1 postgres postgres 16777216 Mar  4 05:44 000000010000000000000019
ltuser=> \q
[postgres@pgsql-lnx1 pg_wal]$ pwd

 ==== >> so far 70000 rows are inserted in test table on ltuser database
 
 our backup was completed on 
START WAL LOCATION: 0/19000028 (file 000000010000000000000019)
CHECKPOINT LOCATION: 0/19000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2019-04-04 05:44:46 CDT   >>>>>>
LABEL: pg_basebackup base backup


===== >>> lets simulate crash 

[postgres@pgsql-lnx1 pg_wal]$ ps -ef |grep post
root      1256     1  0 Mar03 ?        00:00:00 /usr/libexec/postfix/master -w
postfix   1284  1256  0 Mar03 ?        00:00:00 qmgr -l -t unix -u
postgres  4104     1  0 Mar03 ?        00:00:02 /opt/postgres/pghome/10.6/bin/postgres -D /pgdata/datadir
postgres  4106  4104  0 Mar03 ?        00:00:03 postgres: checkpointer process
postgres  4107  4104  0 Mar03 ?        00:00:01 postgres: writer process
postgres  4108  4104  0 Mar03 ?        00:00:03 postgres: wal writer process
postgres  4109  4104  0 Mar03 ?        00:00:04 postgres: autovacuum launcher process
postgres  4110  4104  0 Mar03 ?        00:00:08 postgres: stats collector process
postgres  4111  4104  0 Mar03 ?        00:00:00 postgres: bgworker: logical replication launcher
root     12813 12760  0 04:37 pts/3    00:00:00 su - postgres
postgres 12814 12813  0 04:37 pts/3    00:00:00 -bash
postfix  14203  1256  0 05:26 ?        00:00:00 pickup -l -t unix -u
postgres 14546     1  0 05:39 pts/3    00:00:00 /opt/postgres/pghome/10.6/bin/postgres -D /pgdata/datadir_clone
postgres 14548 14546  0 05:39 ?        00:00:00 postgres: checkpointer process
postgres 14549 14546  0 05:39 ?        00:00:00 postgres: writer process
postgres 14550 14546  0 05:39 ?        00:00:00 postgres: wal writer process
postgres 14551 14546  0 05:39 ?        00:00:00 postgres: autovacuum launcher process
postgres 14552 14546  0 05:39 ?        00:00:00 postgres: archiver process   last was 000000010000000000000019
postgres 14553 14546  0 05:39 ?        00:00:00 postgres: stats collector process
postgres 14554 14546  0 05:39 ?        00:00:00 postgres: bgworker: logical replication launcher
root     15199 15135  0 06:00 pts/2    00:00:00 su - postgres
postgres 15200 15199  0 06:00 pts/2    00:00:00 -bash
postgres 15892 15200  0 06:26 pts/2    00:00:00 ps -ef
postgres 15893 15200  0 06:26 pts/2    00:00:00 grep --color=auto post
[postgres@pgsql-lnx1 pg_wal]$ kill -9 14546
===== >>>> remove files under data directory 

[postgres@pgsql-lnx1 datadir_clone]$ rm -rf *
[postgres@pgsql-lnx1 datadir_clone]$ ls -lart
total 4
drwxr-xr-x. 9 postgres postgres 4096 Mar  4 04:41 ..
drwx------. 2 postgres postgres    6 Mar  4 06:28 .

===== >>>> try to start database 

[postgres@pgsql-lnx1 datadir_clone]$ pg_ctl -D $PWD start
pg_ctl: directory "/pgdata/datadir_clone" is not a database cluster directory
[postgres@pgsql-lnx1 datadir_clone]$ pwd
/pgdata/datadir_clone
[postgres@pgsql-lnx1 datadir_clone]$ psql -p 5444
psql: could not connect to server: Connection refused
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5444"?

===== >>> let's restore from backup 

[postgres@pgsql-lnx1 datadir_clone]$ cd /pgbkup/backup/fsbackup/online/
[postgres@pgsql-lnx1 online]$ ls
backup_label  global   pg_commit_ts  pg_hba.conf    pg_logical    pg_notify    pg_serial     pg_stat      pg_subtrans  pg_twophase  pg_wal   postgresql.auto.conf  postgresql.conf.orig
base          logfile  pg_dynshmem   pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION   pg_xact  postgresql.conf
[postgres@pgsql-lnx1 online]$ cp -rp * /pgdata/datadir_clone
==== >>> create recovery command file under data directory
cat recovery.conf
restore_command = 'cp  /pgdata/archive/5444/%f %p'
==== >>> start recovery 

[postgres@pgsql-lnx1 datadir_clone]$ pg_ctl -D $PWD start
waiting for server to start....2019-04-04 06:30:40.174 CDT [16043] LOG:  listening on IPv6 address  "::1", port 5444
2019-04-04 06:30:40.174 CDT [16043] LOG:  listening on IPv4 address "127.0.0.1", port 5444
2019-04-04 06:30:40.177 CDT [16043] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5444"
2019-04-04 06:30:40.202 CDT [16044] LOG:  database system was interrupted; last known up at 2019-04-04 05:44:46 CDT
2019-04-04 06:30:40.490 CDT [16044] LOG:  starting archive recovery >>>>>> ******
2019-04-04 06:30:40.555 CDT [16044] LOG:  restored log file "000000010000000000000019" from archive
2019-04-04 06:30:40.609 CDT [16044] LOG:  redo starts at 0/19000028
2019-04-04 06:30:40.612 CDT [16044] LOG:  consistent recovery state reached at 0/19000130
2019-04-04 06:30:40.613 CDT [16043] LOG:  database system is ready to accept read only connections
cp: cannot stat ‘/pgdata/archive/5444/00000001000000000000001A’: No such file or directory
2019-04-04 06:30:40.621 CDT [16044] LOG:  redo done at 0/19000130
2019-04-04 06:30:40.662 CDT [16044] LOG:  restored log file "000000010000000000000019" from archive >>>.****
done
server started  >>>> *****
[postgres@pgsql-lnx1 datadir_clone]$ cp: cannot stat ‘/pgdata/archive/5444/00000002.history’: No such file or directory >>>*** error
2019-04-04 06:30:40.721 CDT [16044] LOG:  selected new timeline ID: 2
2019-04-04 06:30:40.792 CDT [16044] LOG:  archive recovery complete
cp: cannot stat ‘/pgdata/archive/5444/00000001.history’: No such file or directory
2019-04-04 06:30:40.906 CDT [16043] LOG:  database system is ready to accept connections >>>> *** opened database
===== >>> let's validate 


[postgres@pgsql-lnx1 datadir_clone]$ psql -p 5444
psql (10.6)
Type "help" for help.

postgres=# \l
  List of databases
   Name    |   Owner   | Encoding |   Collate   |    Ctype    |      Access privileges
-----------+-----------+----------+-------------+-------------+------------------------------
fal1dbd   | fal1admin | UTF8     | en_US.UTF-8 | en_US.UTF-8 | fal1admin=CTc/fal1admin     +
   |           |          |             |             | =T/fal1admin                +
   |           |          |             |             | visa_system_role=c/fal1admin
fal2dbd   | postgres  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | postgres=CTc/postgres       +
   |           |          |             |             | =T/postgres                 +
   |           |          |             |             | visa_system_role=c/postgres
fal3dbd   | fal3admin | UTF8     | en_US.UTF-8 | en_US.UTF-8 | fal3admin=CTc/fal3admin     +
   |           |          |             |             | =T/fal3admin
ltuser    | ltuser    | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
postgres  | postgres  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | postgres=CTc/postgres       +
   |           |          |             |             | =T/postgres                 +
   |           |          |             |             | visa_system_role=c/postgres
template0 | postgres  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres                 +
   |           |          |             |             | postgres=CTc/postgres
template1 | postgres  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | postgres=CTc/postgres       +
   |           |          |             |             | =c/postgres
(7 rows)

postgres=# \c ltuser ltuser
You are now connected to database "ltuser" as user "ltuser".
ltuser=> \dt
  List of relations
Schema |    Name    | Type  | Owner
--------+------------+-------+--------
public | customer   | table | ltuser
public | district   | table | ltuser
public | history    | table | ltuser
public | item       | table | ltuser
public | new_order  | table | ltuser
public | order_line | table | ltuser
public | orders     | table | ltuser
public | stock      | table | ltuser
public | warehouse  | table | ltuser
(9 rows)

ltuser=>
ltuser=> exit

===== >>>> validation results 

since we lost every thing, it only performed recovery till last available log in hotbackup.
since we lost every thing, it only performed recovery till last available log in hotbackup.
latest data in current wal log is lost and there we have test table.
lost of latest wal file/log will give some data loss, need to find a way to have redundant copy on latest wall logfile




--
Regards,

#!  Pavan Kumar
----------------------------------------------
-
Sr. Database Administrator..!

NEXT GENERATION PROFESSIONALS, LLC
Cell    #  267-799-3182 #  pavan.dba27 (Gtalk)  
India   # 9000459083

Take Risks; if you win, you will be very happy. If you lose you will be Wise  

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux