Re: How to debug a locked backend ?

Richard Huxton <dev@xxxxxxxxxxxx> · Fri, 18 Nov 2005 15:29:59 +0000

Csaba Nagy wrote:
Richar, Martijn,

Thanks for answering, but I had to kill the process in the meantime. I
tried kill -11 in the hope it will produce a core dump at least, but it
either didn't dump core or I don't know where to look for it as I can't
find it.

In any case, this is the second time I experience such a lock-up this
week, so I will definitely need to find out what's going on.

I would exclude hardware failure, as it happened exactly with the same
process, involving exactly the same queries/table and the same failure
symptoms, which is not characteristic for hardware failures (that should
be more random).

So, in order to find out what's going on, what should I do if it happens
again ? Use gdb, and do what ?
Strace is a good idea, I'll do that too if there is a next time.

Well, I've had time to read your previous message too.

The first time you seem to imply the machine slowed down across all 
processes - ssh etc. Was that the case this time?

When you say "locked" do you mean it was waiting on locks, was using all 
the CPU, unresponsive or just taking the query a long time?

To prepare for next time I'd:
1. Leave ssh logged-in, run screen to get three sessions
2. Leave "top" running in the first - that'll show you process 
activity/general load
3. Run "vmstat 10" in the second - that'll show you overall 
memory/swap/disk/cpu usage.
4. The third session is then free to work in, if neither of the first 
two show anything useful.

--
  Richard Huxton
  Archonet Ltd