Craig A. James wrote:
By the way, in spite of my questions and concerns, I was *very*
impressed by the recovery process. I know it might seem like old hat to
you guys to watch the WAL in action, and I know on a theoretical level
it's supposed to work, but watching it recover 150 separate databases,
and find and fix a couple of problems was very impressive. It gives me
great confidence that I made the right choice to use Postgres.
Richard Huxton wrote:
2. Why didn't the database recover? Why are there two processes
that couldn't be killed?
I'm guessing it didn't recover *because* there were two processes that
couldn't be killed. Responsibility for that falls to the
operating-system. I've seen it most often with faulty drivers or
hardware that's being communicated with/written to. However, see below.
It can't be a coincidence that these were the only two processes in a
SELECT operation. Does the server disable signals at critical points?
If a "kill -9" as root doesn't get rid of them, I think I'm right in
saying that it's a kernel-level problem rather than something else.
--
Richard Huxton
Archonet Ltd