We do not use tablespaces at all. We do use table
partitioning very heavily, with many check
constraints. That is the only thing unusual about
the schema.
To my eyes, the birds appear to be flying pretty
darned fast, though we have not figured out how
to remove the message bands quickly without
cutting off their feet.
The server is a virtual machine, and at this point
I will ask the sys admins to get a non-virtual
server running to reconfirm the
problem.
Thanks
From: Tom Lane <tgl@xxxxxxxxxxxxx>
To: Mark Dilger <markdilger@xxxxxxxxx>
Cc: deepak <deepak.pn@xxxxxxxxx>; Alban Hertroys <haramrae@xxxxxxxxx>; "pgsql-general@xxxxxxxxxxxxxx" <pgsql-general@xxxxxxxxxxxxxx>
Sent: Wednesday, May 23, 2012 11:17 AM
Subject: Re: [GENERAL] FATAL: lock file "postmaster.pid" already exists
Mark Dilger <markdilger@xxxxxxxxx> writes:
> Prior to posting to the mailing list, we made some
> changes in postmaster.c to identify where time was
> being spent. Based on the elog(NOTICE,...) lines
> we put in the file, we determined the time was spent
> inside RemovePgTempFiles.
> I then altered RemovePgTempFiles to take a starttime
> parameter and, while recursing, to check if more than
> 5 seconds has passed since it started. I did not want
> to add the complexity of setting an alarm and catching
> the signal, so I just made the code check the wallclock
> time at each step of the recursion. When more than
> 5 seconds has passed, it does not recurse further.
> After making this change, we have not been able to
> reproduce the slowness.
OK, so we're back to the original question: how could this possibly be
taking that long? Have you got thousands of tablespaces (and if so why)?
Does your system have a habit of crashing at times when there are
thousands of temp files? Maybe you're using IP over avian carriers to
access your SAN? It just doesn't make any sense given the information
you've provided.
regards, tom lane