Sergey Konoplev wrote:
Don't forget to cc: the list.
Try not to top-post replies, it's easier to read if you reply below the
text you're replying to.
Thanx for your advice. I'm just absolutely worned out. Sorry.
Know that feeling - let's see if we can't sort this out.
1. Is it always the same query?
2. Does the client still think it's connected?
3. Is that query using up CPU, or just idling?
4. Anything odd in pg_locks for the problem pid?
1. No it isn't. I have few functions (plpgsql, plpython) that cause
such situations more often than another but they are called more often
also.
OK, so there's no real pattern. That would suggest it's not a particular
query-plan that's got something wrong.
Do you always get this problem inside a function?
As far as I remember I do.
Hmm - check Magnus' thoughts on pl/python. Can't comment on Python
myself. Are you sure it's not always the same few function(s) that cause
this problem?
2. The client just waits for query and buzz.
3. They are using CPU in usual way and their pg_lock activity seems normal.
So the backend that appears "stuck" is still using CPU?
Yes but the metter is that this procedures usualy use CPU just a
little so I can't find out if there is some oddity or not.
OK, so it's not that it's stuck in a loop wasting a lot of CPU
So - the symptoms are:
[snip]
Exactly.
So - we need to solve two mysteries
1. Why are these functions not returning?
2. Why does SIGINT not interrupt them?
Are you happy that your hardware and drivers are OK? There aren't
problems with any other servers on this machine?
Yes I'm quite happy. My hardware is: 2 double-core Xeon, 8Gb RAM,
RAID5. What about other software... it's dedicated PG server so I have
no problem with it.
Well, the places I'd look would be:
1. Hardware (you're happy that's fine, and it's not quite the problems
I'd expect)
2. Drivers (same as #1)
3. Client connectivity (but you say the client is fine)
4. External interactions (see below)
5. Bug in PG extension (pl/python)
6. Bug in PG core code
Do any of your functions interact with the outside world - fetch
webpages or similar? It could be they're waiting for that. If you're
using a library that could hang waiting for a response and also block
SIGINT at the same time that would explain everything.
--
Richard Huxton
Archonet Ltd
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org/