Unfortunately, we only have 8.3 - is SIGTERM safe on 8.3?On 8.4, you can use pg_terminate_backend(), which sends a SIGTERM instead of pg_cancel_backend's SIGINT. If you're not on 8.4, you can just do this manually with a "kill -SIGTERM backend_pid". If that doesn't work either, you might have to resort to a "kill -SIGKILL backend_pid". Killing a single backend should be much better for you than restarting Postgres entirely. These operations shouldn't result in database corruption.Second, and the more complicated one - what do I do about rogue queries that are running when my process starts? Today we had a query that ran since yesterday. I called pg_cancel_backend() on it several times and waited for almost two hours - to no avail. Eventually I had to ask our sysadmin to shutdown PostgreSQL, which took some five minutes, but eventually worked. Is there a way to do the same thing to a single process without shutting down the whole server, and without causing any harm to the database or memory corruption? Something I can call from within SQL? I run the nightly script from a linux user which is not "postgres", so I'd prefer a way that doesn't require using "kill".
You have to be database superuser to use pg_cancel_backend() or pg_terminate_backend(), or have a shell login as the database user to use "kill". No way around that for now.I guess I'll have to sudo or use local ssh.
Yeah, I think any of the kill modes on the backends should be "safe" in terms of your data. If you're interested in killing without having to SSH in, you could play with something like:
http://wiki.postgresql.org/wiki/Backend_killer_function
and adjust it as needed (permissions, etc) for your needs.
Though next time you see a query which doesn't respond to pg_cancel_backend(), try gathering information about the query and what the backend is doing; either you're doing something unusual (e.g. an app is restarting the query automatically after getting canceled) or perhaps you've stumbled on a bug in Postgres.
I'd appreciate it if you tell me what to look for. It was running a join on several tables, but nothing too complicated. It may be that the query is not optimized (one of the tables is not indexed properly) but it still should respond to cancel - shouldn't it?
I came across an old thread discussing a problem which sounds similar to yours (process not dying with a pg_cancel_backend()):
http://archives.postgresql.org/pgsql-general/2007-10/msg01696.php
there's some good info throughout that thread, including the recommendation from Tom Lane to try to reproduce, assuming your query isn't stuck inside a plpython or similar function.
If you can reproduce your hung query which doesn't respond to a pg_cancel_backend(), try following instructions here (assuming your server is a Linux/BSD machine) to gather more information about what's going on inside the backend:
http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
strace, top, and/or vmstat information might also be useful.
Hope this helps,
Josh