Re: pg_stat_tmp

Tim Kane <tim.kane@xxxxxxxxx> · Tue, 17 Dec 2013 11:25:06 +0000

Thanks Jeff, Magnus

Thanks for the suggestions.
This morning the same issue occurred, but this time it also complained that it failed to write to pg_clog  (previous days, the only failure was in writing to pg_stat_tmp)

2013-12-17 07:13:04 GMT DETAIL:  Could not write to file "pg_clog/0004" at offset 212992: No space left on device.
2013-12-17 07:13:05 GMT ERROR:  could not access status of transaction 0
2013-12-17 07:13:05 GMT DETAIL:  Could not write to file "pg_clog/0004" at offset 212992: No space left on device.
2013-12-17 07:13:06 GMT ERROR:  could not access status of transaction 0
2013-12-17 07:13:06 GMT DETAIL:  Could not write to file "pg_clog/0004" at offset 212992: No space left on device.
2013-12-17 07:13:07 GMT LOG:  could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
2013-12-17 07:13:07 GMT LOG:  could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
2013-12-17 07:13:08 GMT LOG:  could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
2013-12-17 07:13:08 GMT LOG:  could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on dev2013-12-17 07:25:15 GMT WARNING:  terminating connection because of crash of another server
 process
2013-12-17 07:25:15 GMT DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared m
emory.
2013-12-17 07:25:15 GMT HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2013-12-17 07:25:16 GMT LOG:  all server processes terminated; reinitializing
2013-12-17 07:25:17 GMT LOG:  database system was interrupted; last known up at 2013-12-17 07:08:22 GMT
2013-12-17 07:25:17 GMT LOG:  database system was not properly shut down; automatic recovery in progress
2013-12-17 07:25:17 GMT LOG:  redo starts at 5BC/CF3F9888
2013-12-17 07:25:18 GMT LOG:  could not open file "pg_xlog/00000001000005BC000000D6" (log file 1468, segment 214): No such file or directory
2013-12-17 07:25:18 GMT LOG:  redo done at 5BC/D5FFFEC0
2013-12-17 07:25:18 GMT LOG:  last completed transaction was at log time 2013-12-17 07:25:12.989653+00
2013-12-17 07:25:19 GMT LOG:  database system is ready to accept connections
2013-12-17 07:25:19 GMT LOG:  autovacuum launcher started

It’s never failed to write pg_xlog’s though – or at least, never reported a failure.

For now, I’ve altered the pg_stat_tmp path and we’ll see what happens tomorrow – Ideally though, yes.. I’d like to see this happening at runtime and get a better picture of what’s blowing out here.

Further to this however, I notice that the base directory is consuming more disk than I would have expected it to (all our relations are stored in tablespaces assigned to other disks).
Looking for the larger culprits, I note the following files consuming 4.4GB

9.2/main/base/74641/2260957
9.2/main/base/74641/2260957.1
9.2/main/base/74641/2260957.2
9.2/main/base/74641/2260957.3
9.2/main/base/74641/2260957.4

I notice the timestamps are just shy of the time we experience that pg_stat_tmp issue.

-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957
-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.1
-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.2
-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.3
-rw------- 1 postgres postgres  328466432 Dec 17 06:56 2260957.4

They appear to be orphaned oid’s, though I’m not sure this is definitive:

clone=# select pg_relation_filepath(2260957);
 pg_relation_filepath
----------------------

(1 row)

I probably would have thought these to be temporary files – except for that I have temp_tablespace set elsewhere.

Are these truly orphaned files that postgresql no longer needs (possibly due to a crash) ?    Is there a mechanism to scan the PG_DATA directories to locate such files?
FYI, we’re running 9.2.4

Thanks again,

Tim

From:  Jeff Janes <jeff.janes@xxxxxxxxx>
Date:  Monday, 16 December 2013 17:32
To:  Tim Kane <tim.kane@xxxxxxxxx>
Cc:  pgsql-general General <pgsql-general@xxxxxxxxxxxxxx>
Subject:  Re:  pg_stat_tmp

On Mon, Dec 16, 2013 at 5:57 AM, Tim Kane <tim.kane@xxxxxxxxx> wrote:

Hi all,

The past few days I’ve been encountering the following error, followed by a full db restart and recovery

2013-12-16 07:12:53 GMT LOG:  could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device

Is that the only thing in the logs?  pg_stat_tmp problems should not bring down your database.  But problems with pg_xlog running out of space certainly can--but they should also be logged.

This occurs at a time of moderate load, during the same set of operations each morning.
Interestingly, when I execute this manually at any other time of date, the process completes normally.

I presume that the pg_stat_tmp location is system-wide and likely is not impacted by temp_tablespaces
The root partition, where postgresql is installed does not have a lot of disk available (4GB).

My first instinct here is to symlink pg_stat_tmp against another disk with a little more room to breathe, however I’m surprised that pgstat.tmp would grow to be so large in the first place – possibly there is something else at play here.

We don't know how large it is getting!  If pg_stat_tmp shares the same partition as pg_xlog, base (as in the default configuration), and pg_log, then any of those things could be filling up the partition, and pg_stat_tmp could just be the canary, not the culprit.

Anyway, you don't need to use a symlink, you could just change stats_temp_directory to point someplace else.

Your best bet is run "du" or something similar to figure out where your space is actually going.

Cheers,

Jeff