Re: PITR Backups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I took several snapshots. In all cases the FS was fine. In one case the db looked like on recovery it thought there were outstanding pages to be written to disk as seen below and the db wouldn't start.

Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21 00:39:43 PDTLOG: redo done at 71/99870670 Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1] 2007-06-21 00:39:43 PDTWARNING: page 28905 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [11-1] 2007-06-21 00:39:43 PDTWARNING: page 13626 of relation 1663/16384/76716 did not exist Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [12-1] 2007-06-21 00:39:43 PDTWARNING: page 28904 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [13-1] 2007-06-21 00:39:43 PDTWARNING: page 26711 of relation 1663/16384/76719 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [14-1] 2007-06-21 00:39:43 PDTWARNING: page 28900 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [15-1] 2007-06-21 00:39:43 PDTWARNING: page 3535208 of relation 1663/16384/33190 did not exist Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [16-1] 2007-06-21 00:39:43 PDTWARNING: page 28917 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [17-1] 2007-06-21 00:39:43 PDTWARNING: page 3535207 of relation 1663/16384/33190 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [18-1] 2007-06-21 00:39:43 PDTWARNING: page 28916 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [19-1] 2007-06-21 00:39:43 PDTWARNING: page 28911 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [20-1] 2007-06-21 00:39:43 PDTWARNING: page 26708 of relation 1663/16384/76719 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [21-1] 2007-06-21 00:39:43 PDTWARNING: page 28914 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [22-1] 2007-06-21 00:39:43 PDTWARNING: page 28909 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [23-1] 2007-06-21 00:39:43 PDTWARNING: page 28908 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [24-1] 2007-06-21 00:39:43 PDTWARNING: page 28913 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [25-1] 2007-06-21 00:39:43 PDTWARNING: page 26712 of relation 1663/16384/76719 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [26-1] 2007-06-21 00:39:43 PDTWARNING: page 28918 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [27-1] 2007-06-21 00:39:43 PDTWARNING: page 28912 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [28-1] 2007-06-21 00:39:43 PDTWARNING: page 3535209 of relation 1663/16384/33190 did not exist Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [29-1] 2007-06-21 00:39:43 PDTWARNING: page 28907 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [30-1] 2007-06-21 00:39:43 PDTWARNING: page 28906 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [31-1] 2007-06-21 00:39:43 PDTWARNING: page 26713 of relation 1663/16384/76719 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [32-1] 2007-06-21 00:39:43 PDTWARNING: page 17306 of relation 1663/16384/76710 did not exist Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [33-1] 2007-06-21 00:39:43 PDTWARNING: page 26706 of relation 1663/16384/76719 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [34-1] 2007-06-21 00:39:43 PDTWARNING: page 800226 of relation 1663/16384/33204 did not exist Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [35-1] 2007-06-21 00:39:43 PDTWARNING: page 28915 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [36-1] 2007-06-21 00:39:43 PDTWARNING: page 26710 of relation 1663/16384/76719 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [37-1] 2007-06-21 00:39:43 PDTWARNING: page 28903 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [38-1] 2007-06-21 00:39:43 PDTWARNING: page 28902 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [39-1] 2007-06-21 00:39:43 PDTWARNING: page 28910 of relation 1663/16384/76718 was uninitialized Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [40-1] 2007-06-21 00:39:43 PDTPANIC: WAL contains references to invalid pages Jun 21 00:39:43 sfmedstorageha001 postgres[3503]: [1-1] 2007-06-21 00:39:43 PDTLOG: startup process (PID 3506) was terminated by signal 6 Jun 21 00:39:43 sfmedstorageha001 postgres[3503]: [2-1] 2007-06-21 00:39:43 PDTLOG: aborting startup due to startup process failure Jun 21 00:39:43 sfmedstorageha001 postgres[3505]: [1-1] 2007-06-21 00:39:43 PDTLOG: logger shutting down Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-1] 2007-06-21 00:40:39 PDTLOG: database system was interrupted while in recovery at 2007-06-21 00:36:40 PDT Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-2] 2007-06-21 00:40:39 PDTHINT: This probably means that some data is corrupted and you will have to use the last backup for
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-3]  recovery.
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [2-1] 2007-06-21 00:40:39 PDTLOG: checkpoint record is at 71/9881E928 Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [3-1] 2007-06-21 00:40:39 PDTLOG: redo record is at 71/986BF148; undo record is at 0/0; shutdown FALSE Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [4-1] 2007-06-21 00:40:39 PDTLOG: next transaction ID: 0/2871389429; next OID: 83795 Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [5-1] 2007-06-21 00:40:39 PDTLOG: next MultiXactId: 1; next MultiXactOffset: 0 Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [6-1] 2007-06-21 00:40:39 PDTLOG: database system was not properly shut down; automatic recovery in progress Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [7-1] 2007-06-21 00:40:39 PDTLOG: redo starts at 71/986BF148 Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [8-1] 2007-06-21 00:40:39 PDTLOG: record with zero length at 71/998706A8 Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [9-1] 2007-06-21 00:40:39 PDTLOG: redo done at 71/99870670 Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [10-1] 2007-06-21 00:40:39 PDTWARNING: page 28905 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [11-1] 2007-06-21 00:40:39 PDTWARNING: page 13626 of relation 1663/16384/76716 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [12-1] 2007-06-21 00:40:39 PDTWARNING: page 28904 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [13-1] 2007-06-21 00:40:39 PDTWARNING: page 26711 of relation 1663/16384/76719 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [14-1] 2007-06-21 00:40:39 PDTWARNING: page 28900 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [15-1] 2007-06-21 00:40:39 PDTWARNING: page 3535208 of relation 1663/16384/33190 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [16-1] 2007-06-21 00:40:39 PDTWARNING: page 28917 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [17-1] 2007-06-21 00:40:39 PDTWARNING: page 3535207 of relation 1663/16384/33190 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [18-1] 2007-06-21 00:40:39 PDTWARNING: page 28916 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [19-1] 2007-06-21 00:40:39 PDTWARNING: page 28911 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [20-1] 2007-06-21 00:40:39 PDTWARNING: page 26708 of relation 1663/16384/76719 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [21-1] 2007-06-21 00:40:39 PDTWARNING: page 28914 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [22-1] 2007-06-21 00:40:39 PDTWARNING: page 28909 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [23-1] 2007-06-21 00:40:39 PDTWARNING: page 28908 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [24-1] 2007-06-21 00:40:39 PDTWARNING: page 28913 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [25-1] 2007-06-21 00:40:39 PDTWARNING: page 26712 of relation 1663/16384/76719 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [26-1] 2007-06-21 00:40:39 PDTWARNING: page 28918 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [27-1] 2007-06-21 00:40:39 PDTWARNING: page 28912 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [28-1] 2007-06-21 00:40:39 PDTWARNING: page 3535209 of relation 1663/16384/33190 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [29-1] 2007-06-21 00:40:39 PDTWARNING: page 28907 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [30-1] 2007-06-21 00:40:39 PDTWARNING: page 28906 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [31-1] 2007-06-21 00:40:39 PDTWARNING: page 26713 of relation 1663/16384/76719 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [32-1] 2007-06-21 00:40:39 PDTWARNING: page 17306 of relation 1663/16384/76710 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [33-1] 2007-06-21 00:40:39 PDTWARNING: page 26706 of relation 1663/16384/76719 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [34-1] 2007-06-21 00:40:39 PDTWARNING: page 800226 of relation 1663/16384/33204 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [35-1] 2007-06-21 00:40:39 PDTWARNING: page 28915 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [36-1] 2007-06-21 00:40:39 PDTWARNING: page 26710 of relation 1663/16384/76719 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [37-1] 2007-06-21 00:40:39 PDTWARNING: page 28903 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [38-1] 2007-06-21 00:40:39 PDTWARNING: page 28902 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [39-1] 2007-06-21 00:40:39 PDTWARNING: page 28910 of relation 1663/16384/76718 was uninitialized Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [40-1] 2007-06-21 00:40:39 PDTPANIC: WAL contains references to invalid pages Jun 21 00:40:39 sfmedstorageha001 postgres[3755]: [1-1] 2007-06-21 00:40:39 PDTLOG: startup process (PID 3757) was terminated by signal 6 Jun 21 00:40:39 sfmedstorageha001 postgres[3755]: [2-1] 2007-06-21 00:40:39 PDTLOG: aborting startup due to startup process failure Jun 21 00:40:39 sfmedstorageha001 postgres[3756]: [1-1] 2007-06-21 00:40:39 PDTLOG: logger shutting down




On Jun 25, 2007, at 6:26 AM, Simon Riggs wrote:

On Mon, 2007-06-25 at 19:06 +0900, Koichi Suzuki wrote:

Year, I agree we should carefully follow how Done really did a backup.

My point is PostgreSQL may have to extend the file during the hot backup
to write to the new block.

If the snapshot is a consistent, point-in-time copy then I don't see how any I/O at all makes a difference. To my knowledge, both EMC and NetApp
produce snapshots like this. IIRC, EMC calls these instant snapshots,
NetApp calls them frozen snapshots.

 It is slightly different from Oracle's case.
Oracle allocates all the database space in advance so that there could
be no risk to modify the metadata on the fly.

Not really sure its different.

Oracle allows dynamic file extensions and I've got no evidence that file
extension is prevented from occurring during backup simply as a result
of issuing the start hot backup command.

Oracle and DB2 both support a stop-I/O-to-the-database mode. My
understanding is that isn't required any more if you do an instant
snapshot, so if people are using instant snapshots it should certainly
be the case that they are safe to do this with PostgreSQL also.

Oracle is certainly more picky about snapshotted files than PostgreSQL
is. In Oracle, each file has a header with the LSN of the last
checkpoint in it. This is used at recovery time to ensure the backup is
consistent by having exactly equal LSNs across all files. PostgreSQL
doesn't use file headers and we don't store the LSN on a per-file basis,
though we do store the LSN in the control file for the whole server.

 In our case, because SAN
based storage snapshot is device level, not file system level, even a
file system does not know that the snapshot is being taken and we might encounter the case where metadata and/or user data are not consistent.
Such snapshot (whole filesystem) might be corrupted and cause file
system level error.

I'm interested in this.   Any further comment/openion is welcome.

If you can show me either

i) an error that occurs after the full and correct PostgreSQL hot backup
procedures have been executed, or

ii) present a conjecture that explains in detail how a device level
error might occur

then I will look into this further.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com






[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux