Re: intermittant performance problem

Mike Charnoky <noky@xxxxxxxxxxx> · Wed, 25 Mar 2009 12:09:39 -0400

Mike Charnoky wrote:
Scott Marlowe wrote:
On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky <noky@xxxxxxxxxxx> wrote:
The random sampling query is normally pretty snappy.  It usually takes on
the order of 1 second to sample a few thousand rows of data out of a few
million.  The sampling is consistently quick, too.  However, on some days,
the sampling starts off quick, then when the process starts sampling from a
different subset of data (different range of times for the same day), the
sampling query takes a couple minutes.

Then definitely look at saving explain plans before execution to
compare fast to slow runs.  This definitely sounds like ocassionally
bad query plans to me so far.

Tom Lane wrote:
Mike Charnoky <noky@xxxxxxxxxxx> writes:
The sampling query which runs really slow on some days looks something
like this:
INSERT INTO sampled_data
  (item_name, timestmp, ... )
  SELECT item_name, timestmp, ... )
  FROM raw_data
  WHERE timestmp >= ? and timestmp < ?
  AND item_name=?
  AND some_data_field NOTNULL
  ORDER BY random()
  LIMIT ?;
Hmph, I'd expect that that would run pretty slowly *all* the time :-(.
There's no good way to optimize "ORDER BY random()".  However, it seems
like the first thing you should do is modify the program so that it
issues an EXPLAIN for that right before actually doing the query, and
then you could see if the plan is different on the slow days.

I'm at the end of my rope here.  Tried the following:

* Manually run ANALYZE on the table before running the sampling query. 
This does not help, there are still times when the sampling runs slower 
by two orders of magnitude and disk IO is through the roof.
* Bump work_mem for the query to 128MB (up from 32MB).  This did not 
help.  Also, no temp files were created in $PG/data/base/pgsql_tmp/, so 
work_mem does not seem to be an issue.
* EXPLAINs look nearly identical whether the query runs quickly or slowly

The thing that gets me is, why does this query totally hose the entire 
database?  Other clients have a very hard time writing to the db when 
this sampling query is running slow, disk IO is maxxed out.  This just 
doesn't seem right.  Why would a single pg backend strangle db 
performance to such an extent?  Aren't there ways to throttle this back?

Due to the nature of the sampling (need to limit using several 
parameters with a WHERE clause), I can't just generate random numbers to 
select data that I need.  Looks like I am stuck using ORDER BY RANDOM(). 
 The only other option at this point seems to be to implement 
TABLESAMPLE, probably starting with the academic work that Neil Conway 
published (http://neilconway.org/talks/hacking/)

Mike

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general