We have a 9.1.11 backend (Ubuntu 12.04 x86_64, m1.medium EC2 instance) which seems to be stuck at COMMIT for 2 days now: mydb=# SELECT procpid, waiting, current_query, CURRENT_TIMESTAMP - query_start AS query_elapsed, CURRENT_TIMESTAMP - xact_start AS xact_elapsed FROM pg_stat_activity WHERE procpid != pg_backend_pid() AND current_query != '<IDLE>'; -[ RECORD 1 ]-+----------------------- procpid | 6061 waiting | f current_query | COMMIT; query_elapsed | 2 days 08:59:17.619142 xact_elapsed | 3 days 15:48:10.739912 The transaction behind that COMMIT has been the only thing running on this Postgres instance for the past 3 days or so, since Postgres was started on that machine. I spun the EC2 instance for this database up solely to test a database subsetting process, which is what the transaction was doing before it got stuck at COMMIT -- using a bunch of DELETEs and ALTER TABLE ... DROP|ADD CONSTRAINTs to delete 90% or so of our data in order to be able to pg_dump a slimmed-down development copy. The EC2 instances we use have separate EBS-backed volumes for the PostgreSQL data and WAL directories. The backend in question seems to be stuck reading a ton of data from the data partition: the monitoring for those EBS volumes shows those volumes have been hammered reading a constant aggregate 90MB/sec since that COMMIT started. The write bandwidth to the postgresql-data partition has been almost nil since the COMMIT, and there has been no read/write activity on the WAL volumes. Here, we can see that backend has managed to read 22 TB despite the fact that the entire database is only 228 GB on disk. $ sudo cat /proc/6061/io rchar: 24505414843923 wchar: 23516159014 syscr: 2991395854 syscw: 2874613 read_bytes: 24791719338496 write_bytes: 22417580032 cancelled_write_bytes: 221208576 $ df -h /dev/md0 /dev/md1 Filesystem Size Used Avail Use% Mounted on /dev/md0 480G 228G 253G 48% /mnt/ebs/postgresql-data /dev/md1 32G 20G 13G 61% /mnt/ebs/postgresql-wal Running an strace on the backend shows a whole ton of read() calls and the occasional lseek(). I grabbed a backtrace of the backend with gdb, attached. Attached also are the non-default pg_settings for this instance. You'll notice that fsync, full_page_writes, and autovacuum are all off: this is intentional, since this instance is transient and has nothing important on it. There are no interesting errors in the Postgres log files since it was spun up. Any ideas on how to further diagnose or avoid this problem? Josh
(gdb) bt #0 0x00007fa235a73fe0 in read () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00000000006249fd in read (__nbytes=8192, __buf=0x7fa21f8d7ee0, __fd=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/unistd.h:45 #2 FileRead (file=<optimized out>, buffer=0x7fa21f8d7ee0 "RI", amount=8192) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/storage/file/fd.c:1191 #3 0x000000000063d856 in mdread (reln=<optimized out>, forknum=<optimized out>, blocknum=253211, buffer=0x7fa21f8d7ee0 "RI") at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/storage/smgr/md.c:646 #4 0x0000000000620556 in ReadBuffer_common (smgr=<optimized out>, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=253211, mode=RBM_NORMAL, strategy=<optimized out>, hit=0x7fffa31cb7df "") at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/storage/buffer/bufmgr.c:440 #5 0x0000000000621014 in ReadBufferExtended (reln=0x18678a0, forkNum=MAIN_FORKNUM, blockNum=253211, mode=RBM_NORMAL, strategy=<optimized out>) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/storage/buffer/bufmgr.c:246 #6 0x000000000046ff81 in heapgetpage (scan=0x2202340, page=253211) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/access/heap/heapam.c:230 #7 0x000000000047049a in heapgettup_pagemode (scan=0x2202340, dir=<optimized out>, nkeys=0, key=0x0) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/access/heap/heapam.c:824 #8 0x0000000000471506 in heap_getnext (scan=0x2202340, direction=<optimized out>) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/access/heap/heapam.c:1349 #9 0x000000000058a50b in SeqNext (node=<optimized out>) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/nodeSeqscan.c:66 #10 0x0000000000579fa3 in ExecScanFetch (recheckMtd=0x58a4d0 <SeqRecheck>, accessMtd=0x58a4e0 <SeqNext>, node=0x2201590) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execScan.c:82 #11 ExecScan (node=0x2201590, accessMtd=0x58a4e0 <SeqNext>, recheckMtd=0x58a4d0 <SeqRecheck>) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execScan.c:167 #12 0x0000000000572a98 in ExecProcNode (node=0x2201590) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execProcnode.c:394 #13 0x000000000058587a in ExecLockRows (node=0x22013a0) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/nodeLockRows.c:54 #14 0x0000000000572948 in ExecProcNode (node=0x22013a0) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execPro---Type <return> to continue, or q <return> to quit--- cnode.c:486 #15 0x000000000056fd62 in ExecutePlan (dest=0xafdea0, direction=<optimized out>, numberTuples=1, sendTuples=1 '\001', operation=CMD_SELECT, planstate=0x22013a0, estate=0x22011e0) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execMain.c:1442 #16 standard_ExecutorRun (queryDesc=0x21ff220, direction=<optimized out>, count=1) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execMain.c:316 #17 0x000000000059341d in _SPI_pquery (tcount=<optimized out>, fire_triggers=0 '\000', queryDesc=<optimized out>) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/spi.c:2110 #18 _SPI_execute_plan (paramLI=0x21ff1d0, snapshot=<optimized out>, crosscheck_snapshot=0x0, read_only=0 '\000', fire_triggers=0 '\000', tcount=1, plan=<optimized out>) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/spi.c:1922 #19 0x000000000059386e in SPI_execute_snapshot (plan=0x204a7d0, Values=0x7fffa31cbcc0, Nulls=0x7fffa31cbed0 " ", snapshot=0x0, crosscheck_snapshot=0x0, read_only=0 '\000', fire_triggers=0 '\000', tcount=1) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/spi.c:462 #20 0x00000000006c8d50 in ri_PerformCheck (qkey=0x7fffa31cbf90, qplan=0x204a7d0, fk_rel=0x18678a0, pk_rel=0x1868c00, old_tuple=0x7fffa31cca00, new_tuple=0x0, detectNewRows=1 '\001', expect_OK=5, constrname=0x7fffa31cc0d4 "api_apirecordconv_rec_submission_id_fkey") at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/utils/adt/ri_triggers.c:3427 #21 0x00000000006ca1f5 in RI_FKey_noaction_del (fcinfo=<optimized out>) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/utils/adt/ri_triggers.c:818 #22 0x0000000000555fda in ExecCallTriggerFunc (trigdata=0x7fffa31cc9d0, tgindx=0, finfo=0x202c9b0, instr=0x0, per_tuple_context=<optimized out>) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/commands/trigger.c:1831 #23 0x000000000055668f in AfterTriggerExecute (per_tuple_context=0x2038280, instr=0x0, finfo=0x202c9b0, rel=0x1868c00, event=0x21983a8, trigdesc=<optimized out>) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/commands/trigger.c:3369 #24 afterTriggerInvokeEvents (events=0x1617ee0, firing_id=13, estate=0x202c670, delete_ok=<optimized out>) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/commands/trigger.c:3548 #25 0x000000000055baf5 in AfterTriggerFireDeferred () at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/commands/trigger---Type <return> to continue, or q <return> to quit--- .c:3795 #26 0x000000000048c8e5 in CommitTransaction () at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/access/transam/xact.c:1785 #27 0x000000000048cf3d in CommitTransactionCommand () at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/access/transam/xact.c:2563 #28 0x000000000064017b in finish_xact_command () at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/tcop/postgres.c:2452 #29 finish_xact_command () at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/tcop/postgres.c:2441 #30 0x00000000006437dd in exec_simple_query (query_string=0x16e9e60 "COMMIT;") at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/tcop/postgres.c:1051 #31 PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=0x1613cc0 "siphon", username=<optimized out>) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/tcop/postgres.c:3976 #32 0x0000000000604443 in BackendRun (port=0x16570a0) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/postmaster/postmaster.c:3612 #33 BackendStartup (port=0x16570a0) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/postmaster/postmaster.c:3302 #34 ServerLoop () at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/postmaster/postmaster.c:1466 #35 0x0000000000604eb1 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/postmaster/postmaster.c:1127 #36 0x000000000045a720 in main (argc=5, argv=0x1613180) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/main/main.c:199
mydb=# SELECT name, current_setting(name) FROM pg_settings WHERE source NOT IN ('default', 'override'); name | current_setting ------------------------------+---------------------------------- application_name | psql autovacuum | off checkpoint_completion_target | 0.9 checkpoint_segments | 32 client_encoding | UTF8 DateStyle | ISO, MDY default_text_search_config | pg_catalog.english effective_cache_size | 3300MB effective_io_concurrency | 8 external_pid_file | /var/run/postgresql/9.1-main.pid fsync | off full_page_writes | off hot_standby | on hot_standby_feedback | on lc_messages | en_US.UTF-8 lc_monetary | en_US.UTF-8 lc_numeric | en_US.UTF-8 lc_time | en_US.UTF-8 listen_addresses | * log_directory | /var/log/postgresql log_file_mode | 0640 log_line_prefix | %t %p log_min_duration_statement | 10s log_rotation_size | 100MB log_timezone | UTC logging_collector | on maintenance_work_mem | 512MB max_connections | 200 max_stack_depth | 2MB max_standby_archive_delay | 2h max_standby_streaming_delay | 2h max_wal_senders | 0 port | 5432 shared_buffers | 512MB ssl | on ssl_renegotiation_limit | 0 synchronous_commit | off TimeZone | UTC unix_socket_directory | /var/run/postgresql wal_keep_segments | 1700 wal_level | minimal (41 rows)
-- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance