Hello, I'm looking for some insight on an intermittent PostgreSQL performance problem that has been very troublesome. Using PG 8.3.5 on a server running CentOS 5 2.6.18-8.el5 (Dual Xeon 2.00 GHz, 4 GB RAM, RAID-10 SCSI 600GB array). The problem in a nutshell is this: on some days, a nightly sampling process (which reads data from one very large table and writes to another) runs about 2 orders of magnitude slower and disk IO goes through the roof. The sampling process (which starts at 1am and usually takes ~30 minutes) takes many hours to complete and eventually ends up interfering with other processes that need to access the database. Other processes which need to write to the db get backed up and eventually data gets dropped (ie: in memory queues waiting for db writes get filled up). The problem only happens on maybe one or two random days during the week. There is generally no other db activity during this time (pg_stat_activity shows mostly idle connections). It seems as if db cache is not being used properly and heavy disk usage results. Not sure how to test this assumption. Details are as follows: 1) The db contains a "raw_data" table which stores on the order of 10-15 million rows per day. A total of two weeks of data are stored, total table size is about 40GB (with indices). 2) Every day, a process runs which samples data from the "raw_data" table and stores it to the "sampled_data" table for long term storage. The sampling is done across two strata: time of day (time range) and name of item (text field). That is, the day is divided into about 5 chunks to ensure that we sample enough data for each chunk of time, for each item. 3) The sampling process happens in two passes. The first pass looks at the data in order to determine the sample size required for each (time of day, item name). This consists of running some aggregate queries over the entire dataset to be sampled (std dev, count, etc). Sample sizes are calculated for all items at once for a given chunk of time. The second pass actually performs random sampling of the data and stores the samples in the "sampled_data" table. It is this second pass of the sampling process that is running about 2 orders of magnitude slower! 4) After the sampling process finishes, the oldest day's worth of data is deleted from the "raw_data" table. The sampling query which runs really slow on some days looks something like this: INSERT INTO sampled_data (item_name, timestmp, ... ) SELECT item_name, timestmp, ... ) FROM raw_data WHERE timestmp >= ? and timestmp < ? AND item_name=? AND some_data_field NOTNULL ORDER BY random() LIMIT ?; We have done a great deal of PG tuning, including the autovacuum for the "raw_data" table. Autovacuum kicks like clockwork every day on that table after the sampling process finishes (after one day's worth of data is deleted from "raw_data" table, a roughly 7% change in size). Changes made to postgresql.conf include: max_connections = 250 shared_buffers = 1024MB work_mem = 32MB maintenance_work_mem = 256MB max_fsm_pages = 10000000 max_fsm_relations = 30000 checkpoint_segments = 64 checkpoint_timeout = 10min checkpoint_warning = 1min Any pointers on how to troubleshoot this? Mike -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general