Thanks Jeff, Magnus Thanks for the suggestions. This morning the same issue occurred, but this time it also complained that it failed to write to pg_clog (previous days, the only failure was in writing to pg_stat_tmp) 2013-12-17 07:13:04 GMT DETAIL: Could not write to file "pg_clog/0004" at offset 212992: No space left on device. 2013-12-17 07:13:05 GMT ERROR: could not access status of transaction 0 2013-12-17 07:13:05 GMT DETAIL: Could not write to file "pg_clog/0004" at offset 212992: No space left on device. 2013-12-17 07:13:06 GMT ERROR: could not access status of transaction 0 2013-12-17 07:13:06 GMT DETAIL: Could not write to file "pg_clog/0004" at offset 212992: No space left on device. 2013-12-17 07:13:07 GMT LOG: could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device 2013-12-17 07:13:07 GMT LOG: could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device 2013-12-17 07:13:08 GMT LOG: could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device 2013-12-17 07:13:08 GMT LOG: could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on dev2013-12-17 07:25:15 GMT WARNING: terminating connection because of crash of another server process 2013-12-17 07:25:15 GMT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared m emory. 2013-12-17 07:25:15 GMT HINT: In a moment you should be able to reconnect to the database and repeat your command. 2013-12-17 07:25:16 GMT LOG: all server processes terminated; reinitializing 2013-12-17 07:25:17 GMT LOG: database system was interrupted; last known up at 2013-12-17 07:08:22 GMT 2013-12-17 07:25:17 GMT LOG: database system was not properly shut down; automatic recovery in progress 2013-12-17 07:25:17 GMT LOG: redo starts at 5BC/CF3F9888 2013-12-17 07:25:18 GMT LOG: could not open file "pg_xlog/00000001000005BC000000D6" (log file 1468, segment 214): No such file or directory 2013-12-17 07:25:18 GMT LOG: redo done at 5BC/D5FFFEC0 2013-12-17 07:25:18 GMT LOG: last completed transaction was at log time 2013-12-17 07:25:12.989653+00 2013-12-17 07:25:19 GMT LOG: database system is ready to accept connections 2013-12-17 07:25:19 GMT LOG: autovacuum launcher started It’s never failed to write pg_xlog’s though – or at least, never reported a failure. For now, I’ve altered the pg_stat_tmp path and we’ll see what happens tomorrow – Ideally though, yes.. I’d like to see this happening at runtime and get a better picture of what’s blowing out here. Further to this however, I notice that the base directory is consuming more disk than I would have expected it to (all our relations are stored in tablespaces assigned to other disks). Looking for the larger culprits, I note the following files consuming 4.4GB 9.2/main/base/74641/2260957 9.2/main/base/74641/2260957.1 9.2/main/base/74641/2260957.2 9.2/main/base/74641/2260957.3 9.2/main/base/74641/2260957.4 I notice the timestamps are just shy of the time we experience that pg_stat_tmp issue. -rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957 -rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.1 -rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.2 -rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.3 -rw------- 1 postgres postgres 328466432 Dec 17 06:56 2260957.4 They appear to be orphaned oid’s, though I’m not sure this is definitive: clone=# select pg_relation_filepath(2260957); pg_relation_filepath ---------------------- (1 row) I probably would have thought these to be temporary files – except for that I have temp_tablespace set elsewhere. Are these truly orphaned files that postgresql no longer needs (possibly due to a crash) ? Is there a mechanism to scan the PG_DATA directories to locate such files? FYI, we’re running 9.2.4 Thanks again, Tim From: Jeff Janes <jeff.janes@xxxxxxxxx> Date: Monday, 16 December 2013 17:32 To: Tim Kane <tim.kane@xxxxxxxxx> Cc: pgsql-general General <pgsql-general@xxxxxxxxxxxxxx> Subject: Re: pg_stat_tmp On Mon, Dec 16, 2013 at 5:57 AM, Tim Kane <tim.kane@xxxxxxxxx> wrote:
Is that the only thing in the logs? pg_stat_tmp problems should not bring down your database. But problems with pg_xlog running out of space certainly can--but they should also be logged.
We don't know how large it is getting! If pg_stat_tmp shares the same partition as pg_xlog, base (as in the default configuration), and pg_log, then any of those things could be filling up the partition, and pg_stat_tmp could just be the canary, not the culprit. Anyway, you don't need to use a symlink, you could just change stats_temp_directory to point someplace else. Your best bet is run "du" or something similar to figure out where your space is actually going. Cheers, Jeff |