Hi,
We have been running into a strange problem on our production servers of
a system we are about to roll out with the loading of large objects. At
various times (usually separated by a few days), the loading process
against the server will suddenly either take an extremely long time
(going from the usual 15-20 minutes up to 45 or more minutes) or simply
hang up altogether (i.e., when 8+ hours have gone by, we haven't had the
patience to wait and see if it eventually finishes). This is on an 8.3.3
server running on debian etch - most of the previous testing occurred on
8.1.10, where we did not encounter this problem (though we had other
issues on that platform).
I have run some tests with the loading of large objects into a
PostgreSQL database (8.3.3) using a more rapid turnaround (loading,
deleting, vacuuming directly and then starting the loading over again)
than in production and have been able to provoke what seems to be the
same issue there - usually within 12 hours.
Essentially, we have a C++ program (pqxx) loading a large number of
large objects (somewhere around a 80,000 or so) exhibiting strange
behavior. It varies quite a lot in loading time (from 1600 to 2500
seconds), and at random times it will simply just hang up (at least it
seems so, given that it has not progressed at all after 7-8 hours). It
displays this behaviour irrespective of whether we load all 80000 fields
in one go, or the loading is broken up into smaller continuos bits of
1000 blobs at a time (though it does seem easier to provoke the problem
with the former loading approach). It can run fine for hours at a time,
and then suddenly lock up (in the latest test I did, it loaded the data
correctly 22 times in a row, before locking up on the 23rd run through).
The pattern seems to be identical each time; ps aux shows the process
(wdb) idle in transaction, with no activity on the system being apparent:
postgres 19965 0.0 0.0 72628 5584 ? S Feb17 0:01
/usr/lib/postgresql/8.3/bin/postgres -D /var/lib/postgresql/8.3/main -c
config_file=/etc/postgresql/8.3/main/postgresql.conf
postgres 19967 0.4 0.0 72748 27636 ? Ss Feb17 6:38
postgres: writer
process
postgres 19968 0.0 0.0 72628 1784 ? Ss Feb17 0:09
postgres: wal writer
process
postgres 19969 0.0 0.0 72940 1924 ? Ss Feb17 0:00
postgres: autovacuum launcher
process
postgres 19970 0.0 0.0 42352 1736 ? Ss Feb17 0:02
postgres: stats collector
process
postgres 16617 0.0 0.0 95156 10948 ? Ss 08:42 0:00
postgres: autovacuum worker process
wdb
postgres 18834 0.0 0.1 118632 37320 ? Ss 09:06 0:01
postgres: wdb wdb [local] idle in
transaction
A core dump on the program we did earlier seemed to indicate that the
program was waiting on the database to return on the insert statement.
top indicates that both postgres and the application are idling (no load
on the server whatsoever).
pg_locks and pg_stat_activity seem to indicate that there may perhaps be
some issue with autovacuum?
locktype | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid
| mode | granted
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+--------------------------+---------
relation | 16385 | 2683 | | |
| | | | | 2/1122117 |
16617 | RowExclusiveLock | t
relation | 16385 | 10969 | | |
| | | | | 3/1105 |
27895 | AccessShareLock | t
relation | 16385 | 2613 | | |
| | | | | 2/1122117 |
16617 | ShareUpdateExclusiveLock | t
virtualxid | | | | | 3/1105
| | | | | 3/1105 |
27895 | ExclusiveLock | t
virtualxid | | | | | 2/1122117
| | | | | 2/1122117 |
16617 | ExclusiveLock | t
(5 rows)
datid | datname | procpid | usesysid | usename |
current_query | waiting |
xact_start | query_start |
backend_start | client_addr | client_port
-------+---------+---------+----------+----------+------------------------------------------------------+---------+-------------------------------+-------------------------------+-------------------------------+-------------+-------------
16385 | wdb | 16617 | 10 | postgres | autovacuum: VACUUM
ANALYZE pg_catalog.pg_largeobject | f | 2009-02-18
08:42:45.066922+00 | 2009-02-18 08:42:45.066922+00 | 2009-02-18
08:42:45.065857+00 | |
16385 | wdb | 27895 | 16384 | wdb | select * from
pg_stat_activity ; | f | 2009-02-18
12:21:47.925813+00 | 2009-02-18 12:21:47.925813+00 | 2009-02-18
11:09:31.201507+00 | | -1
16385 | wdb | 18834 | 16384 | wdb | <IDLE> in
transaction | f
| | 2009-02-18 09:06:26.479633+00 |
2009-02-18 09:06:23.443698+00 | | -1
(3 rows)
Any leads as to how to figure out the problem or alternatively how to
work around it would be much appreciated.
I noticed that the disk on the test machine seemed to be running low on
disk caused by an error in the fsm settings, but this was not the case
in the production systems, so I assume that this is not the problem here
either (will fix it for future test runs, though).
Regards,
Michael Akinde
begin:vcard
fn:Michael Akinde
n:Akinde;Michael
org:Meteorologisk Institutt, Norge;IT
adr;quoted-printable:;;Gaustadall=C3=A9en 30D;Oslo;;0313;Norge
email;internet:michael.akinde@xxxxxx
tel;work:22963379
tel;cell:45885379
x-mozilla-html:FALSE
url:http://www.met.no
version:2.1
end:vcard
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general