On Fri, Oct 7, 2011 at 12:36 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > > Sean Laurent <sean@xxxxxxxxxxxxx> writes: > > We've been running into a particularly strange problem that I'm trying to > > better understand. The super short version is that our application servers > > lose their connection to the database when I run a backup during periods of > > higher load and fail to reconnect. > > That's just weird. It sounds like the "xfs_freeze" operation, or the > snapshotting operation, is somehow interrupting network traffic. I'd > not expect such a thing on a normal server, but who knows what's > connected to what in an Amazon EC2 instance? > > Anyway, I'd suggest trying to instrument something to prove or disprove > that there's a networking failure involved. It might be as simple as > watching "ping" behavior ... Agreed that's it very weird. EBS volumes are effectively networked attached storage, so blaming network connectivity was my first inclination as well. Unfortunately, it's definitely not a network failure: - AWS support team has not detected any network outages affecting the EC2 instance or the EBS volumes at any time remotely near when our outages occurred. - I can consistently ping the database instance from the application servers while the problem is occurring. - I can SSH into the database instance and access Postgres while the problem is occurring. -- Sean Laurent Director of Operations StudyBlue, Inc. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general