Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception

Magnus Hagander <magnus@xxxxxxxxxxxx> · Fri, 29 Aug 2008 10:00:42 +0200

david@xxxxxxx wrote:
> On Thu, 28 Aug 2008, Scott Marlowe wrote:
>>> wait a min here, postgres is supposed to be able to survive a
>>> complete box
>>> failure without corrupting the database, if killing a process can
>>> corrupt
>>> the database it sounds like a major problem.
>>
>> Yes it is a major problem, but not with postgresql.  It's a major
>> problem with the linux OOM killer killing processes that should not be
>> killed.
>>
>> Would it be postgresql's fault if it corrupted data because my machine
>> had bad memory?  Or a bad hard drive?  This is the same kind of
>> failure.  The postmaster should never be killed.  It's the one thing
>> holding it all together.
> 
> the ACID guarantees that postgres is making are supposed to mean that
> even if the machine dies, the CPU goes up in smoke, etc, the
> transactions that are completed will not be corrupted.
> 
> if killing the process voids all the ACID protection then something is
> seriously wrong.
> 
> it may loose transactions that are in flight, but it should not corrupt
> the database.

AFAIK, it's not the killing of the postmaster that's the problem. The
backends will continue running and *not* corrupt anything, because the
shared memory and locking sicks around between them.

The issue is if you manage to start a *new* postmaster against the same
data directory. But there's a whole bunch of safeguards against that, so
it certainly shouldn't be something you manage to do by mistake.

I may end up being corrected by someone who knows more, but that's how
I've understood it works. Meaning it is safe against OOM killer, except
it requires manual work to come back up. But it shouldn't corrupt your data.

//Magnus