What else could I've done? COPY to unlogged tbl "hung"/locked the table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm relatively new to Pg.

Question: 
What could have happened to cause the COPY to unlogged table to hang?
And what would the recreate of the unlogged table have "released" that a
reboot of the cluster did not.
Any other place that I could've looked or done something else other than
recreating the unlogged table  and/or rebooting the cluster.

Timeline:
Day before, the cluster host had to be rebooted since it was unresponsive
due to perhaps "a massive loop in vi re substitution" (as dev mentioned it)

Yesterday, a copy to an unlogged table thru java was hung. Usually it is
instantaneous, but it was hung for 2hrs before they contacted me.

pg_locks showed me the following:
  pid  |   locktype    | dbase  |                 table         |
virtualtransaction | virtualxid |   mode           | granted
 26554 | relation      | dbname | pg_class                      | 8/13              
|            | AccessShareLock  | t
 26554 | relation      | dbname | pg_class_oid_index            | 8/13              
|            | AccessShareLock  | t
 26554 | relation      | dbname | pg_class_relname_nsp_index    | 8/13              
|            | AccessShareLock  | t
 26554 | relation      |        | pg_database                   | 8/13              
|            | AccessShareLock  | t
 26554 | relation      |        | pg_database_datname_index     | 8/13              
|            | AccessShareLock  | t
 26554 | relation      |        | pg_database_oid_index         | 8/13              
|            | AccessShareLock  | t
 26554 | relation      | dbname | pg_locks                      | 8/13              
|            | AccessShareLock  | t
 26554 | virtualxid    |        |                               | 8/13              
| 8/13       | ExclusiveLock    | t
 27288 | relation      | dbname | unlogged_tbl_idx              | 11/33             
|            | RowExclusiveLock | t
 27288 | relation      | dbname | unlogged_tbl                  | 11/33             
|            | RowExclusiveLock | t
 27288 | relation      | dbname | unlogged_tbl_key              | 11/33             
|            | RowExclusiveLock | t
 27288 | relation      | dbname | unlogged_tbl_pkey             | 11/33             
|            | RowExclusiveLock | t
 27288 | transactionid |        |                               | 11/33             
|            | ExclusiveLock    | t
 27288 | virtualxid    |        |                               | 11/33             
| 11/33      | ExclusiveLock    | t

When the developer killed the shell script from his end, it didn't release
the locks.
neither pg_cancel nor pg_terminate of the copy job worked.
so finally a kill -9 of the COPY os process caused the cluster to reboot.

2012-08-29 14:29:58 EDT [8119]: [2-1] LOG:  server process (PID 30468) was
terminated by signal 9: Killed
2012-08-29 14:29:58 EDT [8119]: [3-1] LOG:  terminating any other active
server processes
2012-08-29 14:29:58 EDT [25798]: [3-1] WARNING:  terminating connection
because of crash of another server process
2012-08-29 14:29:58 EDT [25798]: [4-1] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2012-08-29 14:29:58 EDT [25798]: [5-1] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.

The load script was kicked off and it hung again.
A fast stop of the server failed, so an immediate stop had to be executed.

2012-08-29 14:45:44 EDT [8119]: [9-1] LOG:  received immediate shutdown
request
2012-08-29 14:45:44 EDT [27288]: [1-1] WARNING:  terminating connection
because of crash of another server process
2012-08-29 14:45:44 EDT [27288]: [2-1] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2012-08-29 14:45:44 EDT [27288]: [3-1] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2012-08-29 14:45:44 EDT [27288]: [4-1] CONTEXT:  COPY unlogged_tbl, line 1:
"50005001|<abacus><address>5291 math
drive</address><address2/><alt_names/><city>atlantis..." 

The developer tried to execute using only the java class and not thru a
shell script.
The copy process was kicked off and it hung again.

I created another table using

db=# create table tmp_tbl as select * from unlogged_tbl;

and COPY to this tmp_tbl from the same file was instantaneous.
However i later noticed that this creates a logged table, NOT an unlogged
table (as i had expected :) )

I dropped and recreated the unlogged tables and that seems to have worked. 
No more hanging of the load script. And so far so good.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/What-else-could-I-ve-done-COPY-to-unlogged-tbl-hung-locked-the-table-tp5721983.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.


-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux