Tom Lane wrote:
Hmm, can you attach to the stuck backend and the vacuum worker process
with gdb and get stack traces from them? The pg_locks view does not
indicate any locking problem, but I'm wondering if there could be a
deadlock at the LWLock level.
My reply seems to have been lost in the ether. Anyway, I fixed the low
fsm settings and managed to replicate the problem in two separate
instances and the problem does not appear to be autovacuum, as I was
able to observe the process hanging long after autovacuum has been
released. Perhaps the vacuuming tasks were getting stuck before because
of the too low fsm setting?
Anyway - the situation now is that just the loading process is hanging
on the server, with an <IDLE> in transaction. But it is definitely the
loading program that is hanging, not the Postgres server.
pg_locks
2701646 | wdb | 26359 | 2701645 | wdb | <IDLE> in transaction
| f | | 2009-02-18
23:57:59.619868+00 | 2009-02-18 23:57:58.461848+00 | |
-1
Backtrace from postgress process
#0 0x00002ad9ed3fef15 in recv () from /lib/libc.so.6
#1 0x000000000053ba38 in secure_read ()
#2 0x0000000000542700 in pq_comm_reset ()
#3 0x0000000000542b47 in pq_getbyte ()
#4 0x00000000005b648d in prepare_for_client_read ()
#5 0x00000000005b6d7a in PostgresMain ()
#6 0x000000000058c34b in ClosePostmasterPorts ()
#7 0x000000000058d06e in PostmasterMain ()
#8 0x00000000005444f5 in main ()
Backtrace from gribLoad
#0 0x00002b2ab43c2c8f in poll () from /lib/libc.so.6
#1 0x00002b2ab47cc4af in PQmblen () from /usr/lib/libpq.so.4
#2 0x00002b2ab47cc590 in pqWaitTimed () from /usr/lib/libpq.so.4
#3 0x00002b2ab47cbe72 in PQgetResult () from /usr/lib/libpq.so.4
#4 0x00002b2ab47cbf4e in PQgetResult () from /usr/lib/libpq.so.4
#5 0x00002b2ab32a0556 in pqxx::connection_base::prepared_exec () from
/usr/lib/libpqxx-2.6.8.so
#6 0x00002b2ab32be6ed in pqxx::transaction_base::prepared_exec () from
/usr/lib/libpqxx-2.6.8.so
#7 0x00002b2ab32b2486 in pqxx::prepare::invocation::exec () from
/usr/lib/libpqxx-2.6.8.so
#8 0x00002b2ab2d9b4cc in wdb::database::WriteValue::operator() () from
/usr/lib/libwdbLoaderBase.so.0
#9 0x00002b2ab2da27d8 in
pqxx::connection_base::perform<wdb::database::WriteValue> ()
from /usr/lib/libwdbLoaderBase.so.0
#10 0x00002b2ab2d99ddb in
wdb::database::LoaderDatabaseConnection::loadField () from
/usr/lib/libwdbLoaderBase.so.0
#11 0x00000000004182f0 in log4cpp::CategoryStream::operator<< <char [13]> ()
#12 0x00000000004073e8 in ?? ()
#13 0x000000000040819f in ?? ()
#14 0x00002b2ab431e4ca in __libc_start_main () from /lib/libc.so.6
#15 0x000000000040665a in ?? ()
#16 0x00007ffff7e3d6c8 in ?? ()
#17 0x0000000000000000 in ?? ()
Whatever weirdness happens appears to always occur at this point in the
process (previous stacktraces we've done point to the same insert
statement), but the timing is seemingly totally random (it can occur
right away, or the loading can run dozens of times before getting
stuck). I am rather at a loss to explain this. We've loaded literally
millions of rows with this code, so the functionality is hardly
untested. And is it something we are doing, or
could we have hit upon some concurrency issue in pq or pqxx transactors?
Any hints or tips to help identify the problem would be appreciated.
Strangely, if one strace's into the loading process (not the postgres
process), then the poll() call on which the process can have been
hanging for hours will release and the process will just go on as if
nothing has happened. Anyone seen stuff like this happen before?
Regards,
Michael A.
begin:vcard
fn:Michael Akinde
n:Akinde;Michael
org:Meteorologisk Institutt, Norge;IT
adr;quoted-printable:;;Gaustadall=C3=A9en 30D;Oslo;;0313;Norge
email;internet:michael.akinde@xxxxxx
tel;work:22963379
tel;cell:45885379
x-mozilla-html:FALSE
url:http://www.met.no
version:2.1
end:vcard
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general