Re: Trying to track down weird query stalls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 30, 2009 at 12:42 PM,  <dan@xxxxxxxxx> wrote:
>> On Mon, Mar 30, 2009 at 1:50 PM,  <dan@xxxxxxxxx> wrote:
>>> I'm running a 64-bit build of Postgres 8.3.5 on AIX 5.3, and have a
>>> really
>>> strange, annoying transient problem with one particular query stalling.
>>>
>>> The symptom here is that when this query is made with X or more records
>>> in
>>> a temp table involved in the join (where X is constant when the problem
>>> manifests, but is different between manifestations) the query takes
>>> about
>>> 20 minutes. When the query is made with X-1 records it takes less than a
>>> second. It's just this one query -- for everything else the system's
>>> nice
>>> and snappy. The machine load's never above 9 (it's a 32 CPU box) and
>>> hasn't had less than 60G of free system memory on it.
>>>
>>> An EXPLAIN ANALYZE of the two queries (with X-1 and X records) is even
>>> more bizarre. Not only are the two query plans identical (save trivial
>>> differences because of the record count differences) but the explain
>>> EXPLAIN ANALYZE total runtimes are near-identical -- the fast case
>>> showed
>>> 259ms, the slow case 265ms.
>>
>> log_min_duration_statement is a good place to start, but it sounds
>> like the query plan you're getting when you test it by hand isn't the
>> same one that's actually executing, so I'm not sure how far that's
>> going to get you.  contrib/auto_explain sounds like it would be just
>> the thing, but unfortunately that's an 8.4 feature which hasn't been
>> released yet.  I'm not sure whether it could be built and run against
>> 8.3.
>
> I'm not executing any of the EXPLAINs by hand, because I didn't want to
> have to worry about typos or filling in temp tables with test data. Inside
> the app the SQL for the problematic query's stored in a variable -- when
> the task runs with debugging enabled it first executes the query with
> EXPLAIN ANALYZE prepended and dumps the output, then it executes the query
> itself. It's possible something's going wrong in that, but the code's
> pretty simple.
>
> Arguably in this case the actual query should run faster than the EXPLAIN
> ANALYZE version, since the cache is hot. (Though that'd only likely shave
> a few dozen ms off the runtime)

Joining a lot of tables together?  Could be GEQO kicking in.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux