On Mon, Mar 30, 2009 at 12:42 PM, <dan@xxxxxxxxx> wrote: >> On Mon, Mar 30, 2009 at 1:50 PM, <dan@xxxxxxxxx> wrote: >>> I'm running a 64-bit build of Postgres 8.3.5 on AIX 5.3, and have a >>> really >>> strange, annoying transient problem with one particular query stalling. >>> >>> The symptom here is that when this query is made with X or more records >>> in >>> a temp table involved in the join (where X is constant when the problem >>> manifests, but is different between manifestations) the query takes >>> about >>> 20 minutes. When the query is made with X-1 records it takes less than a >>> second. It's just this one query -- for everything else the system's >>> nice >>> and snappy. The machine load's never above 9 (it's a 32 CPU box) and >>> hasn't had less than 60G of free system memory on it. >>> >>> An EXPLAIN ANALYZE of the two queries (with X-1 and X records) is even >>> more bizarre. Not only are the two query plans identical (save trivial >>> differences because of the record count differences) but the explain >>> EXPLAIN ANALYZE total runtimes are near-identical -- the fast case >>> showed >>> 259ms, the slow case 265ms. >> >> log_min_duration_statement is a good place to start, but it sounds >> like the query plan you're getting when you test it by hand isn't the >> same one that's actually executing, so I'm not sure how far that's >> going to get you. contrib/auto_explain sounds like it would be just >> the thing, but unfortunately that's an 8.4 feature which hasn't been >> released yet. I'm not sure whether it could be built and run against >> 8.3. > > I'm not executing any of the EXPLAINs by hand, because I didn't want to > have to worry about typos or filling in temp tables with test data. Inside > the app the SQL for the problematic query's stored in a variable -- when > the task runs with debugging enabled it first executes the query with > EXPLAIN ANALYZE prepended and dumps the output, then it executes the query > itself. It's possible something's going wrong in that, but the code's > pretty simple. > > Arguably in this case the actual query should run faster than the EXPLAIN > ANALYZE version, since the cache is hot. (Though that'd only likely shave > a few dozen ms off the runtime) Joining a lot of tables together? Could be GEQO kicking in. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance