On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky <noky@xxxxxxxxxxx> wrote: > The random sampling query is normally pretty snappy. It usually takes on > the order of 1 second to sample a few thousand rows of data out of a few > million. The sampling is consistently quick, too. However, on some days, > the sampling starts off quick, then when the process starts sampling from a > different subset of data (different range of times for the same day), the > sampling query takes a couple minutes. Then definitely look at saving explain plans before execution to compare fast to slow runs. This definitely sounds like ocassionally bad query plans to me so far. > Regarding the concurrent vacuuming, this is definitely not happening. I > always check pg_stat_activity whenever the sampling process starts to lag > behind. I have never seen a vacuum running during this time. And if autovac is getting in the ways, try adjusting the various autovac options. spefically autovacuum_vacuum_cost_delay set to 10 or 20 (mS). > > Interesting idea to issue the EXPLAIN first... I will see if I can > instrument the sampling program to do this. > > Thanks for your help Tom. > > > Mike > > Tom Lane wrote: >> >> Mike Charnoky <noky@xxxxxxxxxxx> writes: >>> >>> The sampling query which runs really slow on some days looks something >>> like this: >> >>> INSERT INTO sampled_data >>> (item_name, timestmp, ... ) >>> SELECT item_name, timestmp, ... ) >>> FROM raw_data >>> WHERE timestmp >= ? and timestmp < ? >>> AND item_name=? >>> AND some_data_field NOTNULL >>> ORDER BY random() >>> LIMIT ?; >> >> Hmph, I'd expect that that would run pretty slowly *all* the time :-(. >> There's no good way to optimize "ORDER BY random()". However, it seems >> like the first thing you should do is modify the program so that it >> issues an EXPLAIN for that right before actually doing the query, and >> then you could see if the plan is different on the slow days. >> >>> We have done a great deal of PG tuning, including the autovacuum for the >>> "raw_data" table. Autovacuum kicks like clockwork every day on that >>> table after the sampling process finishes (after one day's worth of data >>> is deleted from "raw_data" table, a roughly 7% change in size). >> >> Also, are you sure you have ruled out the possibility that the problem >> comes from autovac kicking in *while* the update is running? >> >> regards, tom lane >> > > -- > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- When fascism comes to America, it will be the intolerant selling it as diversity. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general