On Tue, Dec 17, 2013 at 3:25 AM, Tim Kane <tim.kane@xxxxxxxxx> wrote:
Thanks Jeff, MagnusThanks for the suggestions.This morning the same issue occurred, but this time it also complained that it failed to write to pg_clog (previous days, the only failure was in writing to pg_stat_tmp)
...
2013-12-17 07:13:08 GMT LOG: could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on dev2013-12-17 07:25:15 GMT WARNING: terminating connection because of crash of another server
process
...
It’s never failed to write pg_xlog’s though – or at least, never reported a failure.
If your log file is on the same partition as all the other stuff, you are probably losing log messages because there is no room to log them. The mangled line above tends to support this. Logging stopped for 12 minutes, until the crash freed up some space so it could resume.
For now, I’ve altered the pg_stat_tmp path and we’ll see what happens tomorrow – Ideally though, yes.. I’d like to see this happening at runtime and get a better picture of what’s blowing out here.Further to this however, I notice that the base directory is consuming more disk than I would have expected it to (all our relations are stored in tablespaces assigned to other disks).Looking for the larger culprits, I note the following files consuming 4.4GB9.2/main/base/74641/22609579.2/main/base/74641/2260957.19.2/main/base/74641/2260957.29.2/main/base/74641/2260957.39.2/main/base/74641/2260957.4I notice the timestamps are just shy of the time we experience that pg_stat_tmp issue.-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.1-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.2-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.3-rw------- 1 postgres postgres 328466432 Dec 17 06:56 2260957.4They appear to be orphaned oid’s, though I’m not sure this is definitive:clone=# select pg_relation_filepath(2260957);
The transaction that was making the table probably never committed, so its entry never becomes visible. I don't know of any good way of finding out what the name of an uncommitted object will eventually be, which is rather annoying when trying to monitor massives loads that happen inside a large transaction.
(And what you should really be doing is select relname from pg_class where relfilenode=2260957, the relfilenode is often the same as the oid, but not always.)
Cheers,
Jeff