Hi all, our postgresql DB was running fine for a long time, but suddenly we encountered a huge problem which we got fixed only temporarily. We are running debian stable with postgresql 8.1.11. Our app is connecting via JDBC and uses Prepared Statements. We are not running autovacuum but a nightly VACUUM ANALYZE at 3:00 am (without 'full' option of course). Our largest table has 80,000,000 records and has a relation size of 4.4 GB. Indizes of this table are 5.5 GB (total_relation_size = 10GB). this table is queried a lot of times. The second largest table is not even 20% of the largest table. Other tables range from 10KB to 300MB (total_relation_size). The server is dedicated DB server with dual cpu (AMD Athlon(tm) 64 X2), 8 GB of ram, 3ware RAID-1 with SATA harddisks. Everything ran fine. The Vacuum process took quite a long time each night (about 60 minutes) but we didn't care as it was a very low traffic time. Suddenly one morning the database was still running but it was VERY slow. Lots of SELECT queries were hanging around waiting for an answer. The VACUUM process was already done at this moment. First try was to restart the database. After the restart the problem occured again. The vaccum process was already done BEFORE the restart and was not run again. We just restarted the database. Next try was a REINDEX while disonnecting all clients, but it didn't helped either. Next try was to disconnect all clients again (servlet containers), then we did a restart of the database and run "VACUUM ANALYZE" as the only query with no other concurrent queries. It took an hour. Afterwards we started the database and everything ran smoothly. We thought it might have been only some kind of weired circumstances. So we left everything else as it was. Next night the Vacuum process did run again and after this the database slowed down immediately. The same behaviour as the night before. We stopped the database, disconnected all clients and ran VACUUM ANALYZE without concurrent queries. After restarting everything was fine again. We looked in our cpu monitoring and saw that we have huge IOwait while VACUUM is running, not unusual though. But just after VACUUM was finished, the userCPU load raised to 200% (dual cpu). At this time in the early morning there were only very few users using our app. We analyzed the situation further and saw that most queries had a very long duration time (minutes instead of milliseconds). BTW: the vacuum process was not running at this moment. It was logged before as done. As we cannot afford a downtime every morning, we disabled the VACUUM process at night. Next morning everything was fine and is till then (10 days ago). Of course, not vacuuming at all is not a good idea. I don't have any clue what is happening. My thoughts about this as follows (might be totally wrong): * Maybe the Vacuum analyze process has not enough memory and therefore could not ANALYZE the tables correctly. It then writes wrong statistics to the database which results in wrong execution plans using sequence scans instead of index scans. This only happens if the vacuum analyze process runs concurrently with user requests. If it runs on its own, the vacuum process has enough memory and writes correct statistics to the database. Here are some of our configuration parameters. We never really tweaked it as it ran fine. We just raised some parameters. The following list should show all parameters changed from the default: $ cat /etc/postgresql/8.1/main/postgresql.conf | grep -v '^[[:space:]]*#' | grep -v '^$' listen_addresses = '*' port = 5432 max_connections = 300 unix_socket_directory = '/var/run/postgresql' shared_buffers = 30000 work_mem = 10240 maintenance_work_mem = 163840 vacuum_mem = 32000 max_fsm_pages = 500000 bgwriter_lru_percent = 10.0 bgwriter_lru_maxpages = 100 bgwriter_all_percent = 5 bgwriter_all_maxpages = 200 wal_buffers = 16 checkpoint_segments = 10 checkpoint_warning = 3600 effective_cache_size = 180000 random_page_cost = 3 log_min_messages = info log_min_error_statement = warning log_min_duration_statement = 500 log_line_prefix = '%t [%p]' stats_command_string = off stats_block_level = on stats_row_level = on stats_reset_on_server_start = on autovacuum = off autovacuum_naptime = 3000 lc_messages = 'en_US' lc_monetary = 'en_US' lc_numeric = 'en_US' lc_time = 'en_US' I already tried to google my problem and came across some answers which seems to fit, but were all slightly different. Of course updating to 8.3 or restore the whole database might help, but i really want to understand what i did wrong to improve my skills. If any further information is needed, please ask. I hope my information was not too veborse and somebody can help me with my problem. Your help is very appreciated. kind regards, janning