On 10/06/11 10:21 AM, Sean Laurent wrote:
We've been running into a particularly strange problem that I'm trying
to better understand. The super short version is that our application
servers lose their connection to the database when I run a backup
during periods of higher load and fail to reconnect.
Here's an overview of the setup:
- PostgreSQL 9.0.1 hosted on a cc1.4xlarge Amazon EC2 instance running
CentOS 5.6
- 8 disk RAID-0 array of EBS volumes used for primary data storage
- 4 disk RAID-0 array of EBS volumes used for transaction logs
- Root partition is ext3
- RAID arrays are xfs
Backups are taken using a script that runs the following workflow:
- Tell Postgres to start a backup: SELECT pg_start_backup('RAID backup');
- Run "xfs_freeze" on the primary RAID array
- Tell Amazon to take snapshots of each of the EBS volumes
- Run "xfs_freeze -u" to thaw the primary RAID array
- Run "xfs_freeze" on the transaction log RAID array
- Tell Amazon to take snapshots of each of the EBS volumes
- Run "xfs_freeze -u" to thaw the transaction log RAID array
- Tell Postgres the backup is finished: SELECT pg_stop_backup();
- Remove old WAL files
The whole process takes roughly 7 seconds on average. The RAID arrays
are frozen for roughly 2 seconds on average.
While xfs_freeze is in effect, all writes are blocked. This is NOT what
you want to do here, postgres does NOT expect you to take an atomic
snapshot of the database files, rather, by bracketing your backup with
pg_start_backup and pg_stop_backup, it puts things in a state where a
file by file backup will be fine.
from the man pages...
xfs_freeze halts new access to the filesystem and creates a stable
image on disk. xfs_freeze is intended to be used with volume
managers and hardware RAID devices that support the creation of
snapshots.
The mount-point argument is the pathname of the directory where the
filesystem is mounted. The filesystem must be mounted to be frozen
(see mount <http://linux.die.net/man/8/mount>(8)).
The -f flag requests the specified XFS filesystem to be frozen from
new modifications. When this is selected, all ongoing transactions
in the filesystem are allowed to complete, new write system calls
are halted, other calls which modify the filesystem are halted, and
all dirty data, metadata, and log information are written to disk.
Any process attempting to write to the frozen filesystem will block
waiting for the filesystem to be unfrozen.
when postgres's writer processes block, I suspect things go sour fast.
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general