Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections

Craig Ringer <ringerc@xxxxxxxxxxxxx> · Tue, 11 Oct 2011 13:04:44 +0800

On 11/10/11 12:48, John R Pierce wrote:
> On 10/10/11 7:44 PM, Craig Ringer wrote:
>> If blocking writes causes a server failure that persists once writes
>> have been unblocked, that's a bug IMO. You might have a bit of a backlog
>> of writes to clear, but after that all should be well, and if it isn't
>> then something needs fixing.
> 
> the process is blocked waiting for this disk write to complete,
> meanwhile, the packets are queuing up and waiting for service.
> 
> best of luck with all that....

xfs_freeze for long enough to take a snapshot doesn't take long, or it
shouldn't, anyway. Even if it did, that shouldn't cause a server failure
that persists past when disk I/O is resumed, though it might cause
individual connections to drop.

I can `kill -STOP' Pg, or unplug my network cable for several seconds
and expect everything to resume just fine when I `kill -CONT' or plug
back in. Packets will be buffered by the OS if Pg is busy or by the
closest router if the network is unplugged, and will be delivered when
it becomes responsive again. If that takes too long or if too many
packets arrive, packets will be dropped, in which case TCP/IP will
re-send them. If the outage is protracted enough the client might
eventually decide the peer has gone away and drop the connection, but
even then new connections should be established to the server just fine
once it resumes responding.

It is totally unreasonable for Pg to *stay* nonfunctional once disk I/O
resumes. Existing connections should receive responses they're waiting
on or die, depending on how long it's been, and new connections should
be accepted fine.

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general