COMMIT stuck for days after bulk delete

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have a 9.1.11 backend (Ubuntu 12.04 x86_64, m1.medium EC2 instance)
which seems to be stuck at COMMIT for 2 days now:

mydb=# SELECT procpid, waiting, current_query,
CURRENT_TIMESTAMP - query_start AS query_elapsed,
CURRENT_TIMESTAMP - xact_start AS xact_elapsed
FROM pg_stat_activity WHERE procpid != pg_backend_pid() AND
current_query != '<IDLE>';
-[ RECORD 1 ]-+-----------------------
procpid       | 6061
waiting       | f
current_query | COMMIT;
query_elapsed | 2 days 08:59:17.619142
xact_elapsed  | 3 days 15:48:10.739912


The transaction behind that COMMIT has been the only thing running on
this Postgres instance for the past 3 days or so, since Postgres was
started on that machine. I spun the EC2 instance for this database up
solely to test a database subsetting process, which is what the
transaction was doing before it got stuck at COMMIT -- using a bunch
of DELETEs and ALTER TABLE ... DROP|ADD CONSTRAINTs to delete 90% or
so of our data in order to be able to pg_dump a slimmed-down
development copy.

The EC2 instances we use have separate EBS-backed volumes for the
PostgreSQL data and WAL directories. The backend in question seems to
be stuck reading a ton of data from the data partition: the monitoring
for those EBS volumes shows those volumes have been hammered reading a
constant aggregate 90MB/sec since that COMMIT started. The write
bandwidth to the postgresql-data partition has been almost nil since
the COMMIT, and there has been no read/write activity on the WAL
volumes.

Here, we can see that backend has managed to read 22 TB despite the
fact that the entire database is only 228 GB on disk.

$ sudo cat /proc/6061/io
rchar: 24505414843923
wchar: 23516159014
syscr: 2991395854
syscw: 2874613
read_bytes: 24791719338496
write_bytes: 22417580032
cancelled_write_bytes: 221208576

$ df -h /dev/md0 /dev/md1
Filesystem      Size  Used Avail Use% Mounted on
/dev/md0        480G  228G  253G  48% /mnt/ebs/postgresql-data
/dev/md1         32G   20G   13G  61% /mnt/ebs/postgresql-wal

Running an strace on the backend shows a whole ton of read() calls and
the occasional lseek(). I grabbed a backtrace of the backend with gdb,
attached.

Attached also are the non-default pg_settings for this instance.
You'll notice that fsync, full_page_writes, and autovacuum are all
off: this is intentional, since this instance is transient and has
nothing important on it. There are no interesting errors in the
Postgres log files since it was spun up.

Any ideas on how to further diagnose or avoid this problem?

Josh
(gdb) bt
#0  0x00007fa235a73fe0 in read () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00000000006249fd in read (__nbytes=8192, __buf=0x7fa21f8d7ee0, 
    __fd=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/unistd.h:45
#2  FileRead (file=<optimized out>, buffer=0x7fa21f8d7ee0 "RI", amount=8192)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/storage/file/fd.c:1191
#3  0x000000000063d856 in mdread (reln=<optimized out>, 
    forknum=<optimized out>, blocknum=253211, buffer=0x7fa21f8d7ee0 "RI")
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/storage/smgr/md.c:646
#4  0x0000000000620556 in ReadBuffer_common (smgr=<optimized out>, 
    relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=253211, 
    mode=RBM_NORMAL, strategy=<optimized out>, hit=0x7fffa31cb7df "")
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/storage/buffer/bufmgr.c:440
#5  0x0000000000621014 in ReadBufferExtended (reln=0x18678a0, 
    forkNum=MAIN_FORKNUM, blockNum=253211, mode=RBM_NORMAL, 
    strategy=<optimized out>)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/storage/buffer/bufmgr.c:246
#6  0x000000000046ff81 in heapgetpage (scan=0x2202340, page=253211)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/access/heap/heapam.c:230
#7  0x000000000047049a in heapgettup_pagemode (scan=0x2202340, 
    dir=<optimized out>, nkeys=0, key=0x0)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/access/heap/heapam.c:824
#8  0x0000000000471506 in heap_getnext (scan=0x2202340, 
    direction=<optimized out>)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/access/heap/heapam.c:1349
#9  0x000000000058a50b in SeqNext (node=<optimized out>)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/nodeSeqscan.c:66
#10 0x0000000000579fa3 in ExecScanFetch (recheckMtd=0x58a4d0 <SeqRecheck>, 
    accessMtd=0x58a4e0 <SeqNext>, node=0x2201590)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execScan.c:82
#11 ExecScan (node=0x2201590, accessMtd=0x58a4e0 <SeqNext>, 
    recheckMtd=0x58a4d0 <SeqRecheck>)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execScan.c:167
#12 0x0000000000572a98 in ExecProcNode (node=0x2201590)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execProcnode.c:394
#13 0x000000000058587a in ExecLockRows (node=0x22013a0)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/nodeLockRows.c:54
#14 0x0000000000572948 in ExecProcNode (node=0x22013a0)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execPro---Type <return> to continue, or q <return> to quit---
cnode.c:486
#15 0x000000000056fd62 in ExecutePlan (dest=0xafdea0, 
    direction=<optimized out>, numberTuples=1, sendTuples=1 '\001', 
    operation=CMD_SELECT, planstate=0x22013a0, estate=0x22011e0)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execMain.c:1442
#16 standard_ExecutorRun (queryDesc=0x21ff220, direction=<optimized out>, 
    count=1)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execMain.c:316
#17 0x000000000059341d in _SPI_pquery (tcount=<optimized out>, 
    fire_triggers=0 '\000', queryDesc=<optimized out>)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/spi.c:2110
#18 _SPI_execute_plan (paramLI=0x21ff1d0, snapshot=<optimized out>, 
    crosscheck_snapshot=0x0, read_only=0 '\000', fire_triggers=0 '\000', 
    tcount=1, plan=<optimized out>)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/spi.c:1922
#19 0x000000000059386e in SPI_execute_snapshot (plan=0x204a7d0, 
    Values=0x7fffa31cbcc0, Nulls=0x7fffa31cbed0 " ", snapshot=0x0, 
    crosscheck_snapshot=0x0, read_only=0 '\000', fire_triggers=0 '\000', 
    tcount=1)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/spi.c:462
#20 0x00000000006c8d50 in ri_PerformCheck (qkey=0x7fffa31cbf90, 
    qplan=0x204a7d0, fk_rel=0x18678a0, pk_rel=0x1868c00, 
    old_tuple=0x7fffa31cca00, new_tuple=0x0, detectNewRows=1 '\001', 
    expect_OK=5, 
    constrname=0x7fffa31cc0d4 "api_apirecordconv_rec_submission_id_fkey")
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/utils/adt/ri_triggers.c:3427
#21 0x00000000006ca1f5 in RI_FKey_noaction_del (fcinfo=<optimized out>)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/utils/adt/ri_triggers.c:818
#22 0x0000000000555fda in ExecCallTriggerFunc (trigdata=0x7fffa31cc9d0, 
    tgindx=0, finfo=0x202c9b0, instr=0x0, per_tuple_context=<optimized out>)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/commands/trigger.c:1831
#23 0x000000000055668f in AfterTriggerExecute (per_tuple_context=0x2038280, 
    instr=0x0, finfo=0x202c9b0, rel=0x1868c00, event=0x21983a8, 
    trigdesc=<optimized out>)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/commands/trigger.c:3369
#24 afterTriggerInvokeEvents (events=0x1617ee0, firing_id=13, 
    estate=0x202c670, delete_ok=<optimized out>)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/commands/trigger.c:3548
#25 0x000000000055baf5 in AfterTriggerFireDeferred ()
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/commands/trigger---Type <return> to continue, or q <return> to quit---
.c:3795
#26 0x000000000048c8e5 in CommitTransaction ()
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/access/transam/xact.c:1785
#27 0x000000000048cf3d in CommitTransactionCommand ()
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/access/transam/xact.c:2563
#28 0x000000000064017b in finish_xact_command ()
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/tcop/postgres.c:2452
#29 finish_xact_command ()
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/tcop/postgres.c:2441
#30 0x00000000006437dd in exec_simple_query (query_string=0x16e9e60 "COMMIT;")
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/tcop/postgres.c:1051
#31 PostgresMain (argc=<optimized out>, argv=<optimized out>, 
    dbname=0x1613cc0 "siphon", username=<optimized out>)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/tcop/postgres.c:3976
#32 0x0000000000604443 in BackendRun (port=0x16570a0)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/postmaster/postmaster.c:3612
#33 BackendStartup (port=0x16570a0)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/postmaster/postmaster.c:3302
#34 ServerLoop ()
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/postmaster/postmaster.c:1466
#35 0x0000000000604eb1 in PostmasterMain (argc=<optimized out>, 
    argv=<optimized out>)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/postmaster/postmaster.c:1127
#36 0x000000000045a720 in main (argc=5, argv=0x1613180)
    at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/main/main.c:199
mydb=# SELECT name, current_setting(name)        
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');
             name             |         current_setting          
------------------------------+----------------------------------
 application_name             | psql
 autovacuum                   | off
 checkpoint_completion_target | 0.9
 checkpoint_segments          | 32
 client_encoding              | UTF8
 DateStyle                    | ISO, MDY
 default_text_search_config   | pg_catalog.english
 effective_cache_size         | 3300MB
 effective_io_concurrency     | 8
 external_pid_file            | /var/run/postgresql/9.1-main.pid
 fsync                        | off
 full_page_writes             | off
 hot_standby                  | on
 hot_standby_feedback         | on
 lc_messages                  | en_US.UTF-8
 lc_monetary                  | en_US.UTF-8
 lc_numeric                   | en_US.UTF-8
 lc_time                      | en_US.UTF-8
 listen_addresses             | *
 log_directory                | /var/log/postgresql
 log_file_mode                | 0640
 log_line_prefix              | %t %p 
 log_min_duration_statement   | 10s
 log_rotation_size            | 100MB
 log_timezone                 | UTC
 logging_collector            | on
 maintenance_work_mem         | 512MB
 max_connections              | 200
 max_stack_depth              | 2MB
 max_standby_archive_delay    | 2h
 max_standby_streaming_delay  | 2h
 max_wal_senders              | 0
 port                         | 5432
 shared_buffers               | 512MB
 ssl                          | on
 ssl_renegotiation_limit      | 0
 synchronous_commit           | off
 TimeZone                     | UTC
 unix_socket_directory        | /var/run/postgresql
 wal_keep_segments            | 1700
 wal_level                    | minimal
(41 rows)
-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux