The random sampling query is normally pretty snappy. It usually takes
on the order of 1 second to sample a few thousand rows of data out of a
few million. The sampling is consistently quick, too. However, on some
days, the sampling starts off quick, then when the process starts
sampling from a different subset of data (different range of times for
the same day), the sampling query takes a couple minutes.
Regarding the concurrent vacuuming, this is definitely not happening. I
always check pg_stat_activity whenever the sampling process starts to
lag behind. I have never seen a vacuum running during this time.
Interesting idea to issue the EXPLAIN first... I will see if I can
instrument the sampling program to do this.
Thanks for your help Tom.
Mike
Tom Lane wrote:
Mike Charnoky <noky@xxxxxxxxxxx> writes:
The sampling query which runs really slow on some days looks something
like this:
INSERT INTO sampled_data
(item_name, timestmp, ... )
SELECT item_name, timestmp, ... )
FROM raw_data
WHERE timestmp >= ? and timestmp < ?
AND item_name=?
AND some_data_field NOTNULL
ORDER BY random()
LIMIT ?;
Hmph, I'd expect that that would run pretty slowly *all* the time :-(.
There's no good way to optimize "ORDER BY random()". However, it seems
like the first thing you should do is modify the program so that it
issues an EXPLAIN for that right before actually doing the query, and
then you could see if the plan is different on the slow days.
We have done a great deal of PG tuning, including the autovacuum for the
"raw_data" table. Autovacuum kicks like clockwork every day on that
table after the sampling process finishes (after one day's worth of data
is deleted from "raw_data" table, a roughly 7% change in size).
Also, are you sure you have ruled out the possibility that the problem
comes from autovac kicking in *while* the update is running?
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general