I'm running a 64-bit build of Postgres 8.3.5 on AIX 5.3, and have a really strange, annoying transient problem with one particular query stalling. The symptom here is that when this query is made with X or more records in a temp table involved in the join (where X is constant when the problem manifests, but is different between manifestations) the query takes about 20 minutes. When the query is made with X-1 records it takes less than a second. It's just this one query -- for everything else the system's nice and snappy. The machine load's never above 9 (it's a 32 CPU box) and hasn't had less than 60G of free system memory on it. An EXPLAIN ANALYZE of the two queries (with X-1 and X records) is even more bizarre. Not only are the two query plans identical (save trivial differences because of the record count differences) but the explain EXPLAIN ANALYZE total runtimes are near-identical -- the fast case showed 259ms, the slow case 265ms. When the slow query was actually run, though, it took 20 minutes. There were no blocked back ends shown in pg_stat_activity, and the back end itself was definitely moving. I trussed the back end stuck running the slow query and it spent nearly all its time doing kread() and kwrite() calls. The DB log didn't show anything interesting in it. I checked to make sure the SQL statement that was EXPLAIN ANALYZEd was the one actually executed, and I pulled the client code into the debugger and single-stepped through just to make sure it was getting stuck on that one SQL statement and it wasn't the client doing something unexpected. Just to be even more annoying, the problem goes away. Right now I can't trigger the problem. Last friday it happened reliably feeding 58 records into this query. The week before it was 38 records. Today? Nothing, the system's full of snappy. At the moment I'm at a loss as to what's going wrong, and the fact that I can't duplicate it right now when I actually have time to look into the problem's damned annoying. What I'm looking for are some hints as to where to look the next time it does happen, or things I can flip on to catch more in the log. (The logging parameters are all set at their defaults) I'm going to try attaching with dbx to get a stack trace (I assume this is relatively safe?) but past that I'm kind of stumped. Help? -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance