We are running Postgres 8.3 on our production servers on Amazon EC2.
We have a reoccurring problem of slowness initially in every 2 months, after enabling autovacuum every 6 months what only full vacuum can solve. It's kind of a problem as it requires 2hrs downtime and we want to avoid that. What we are doing now is moving to Postgres 9.1 as a desperate measure, but we would like to understand better the root cause of the problem.
If you've seen similar before or have an idea about what's going on here we would really appreciate that insight.
Please see bellow the details.
Thanks guys.
Laszlo
Details:
We have a table with 2-3 million rows, with 30 fields with type varchar, text and number. The table is heavily updated and read too. A super simple query like the following takes unacceptable long time to run. 4-15 sec compared to the under a second run when Postgres works well. We tried index rebuild, more disks, more memory, but nothing helped, only full vacuum. It seems to me that Postgres can't keep internal things in order to operate nicely. Also I see extreme high IO load on the server when this problem occurs. And again, after a full vacuum, things are perfect.
You can see the explain output here:
explain analyze select count(comment0_.id) as col_0_0_ from Comment comment0_ where comment0_.sourceId='xxx' limit 2;
Before full vacuum:
'Limit (cost=14446.76..14446.77 rows=1 width=38) (actual time=4664.694..4664.696 rows=1 loops=1)'
' -> Aggregate (cost=14446.76..14446.77 rows=1 width=38) (actual time=4664.691..4664.692 rows=1 loops=1)'
' -> Bitmap Heap Scan on comment comment0_ (cost=96.07..14436.60 rows=4060 width=38) (actual time=9.135..4645.847rows=18468 loops=1)'
' Recheck Cond: ((sourceid)::text = 'xxx'::text)'
' -> Bitmap Index Scan on comment_sourceid_idx (cost=0.00..95.05 rows=4060 width=0) (actual time=6.424..6.424 rows=18468 loops=1)'
' Index Cond: ((sourceid)::text = 'xxx'::text)'
'Total runtime: 4664.766 ms'
After full vacuum:
'Limit (cost=11877.89..11877.90 rows=1 width=38) (actual time=344.031..344.034 rows=1 loops=1)'
' -> Aggregate (cost=11877.89..11877.90 rows=1 width=38) (actual time=344.024..344.024 rows=1 loops=1)'
' -> Bitmap Heap Scan on comment comment0_ (cost=87.46..11869.24 rows=3461 width=38) (actual time=9.391..326.931 rows=18513 loops=1)'
' Recheck Cond: ((sourceid)::text = 'xxx'::text)'
' -> Bitmap Index Scan on comment_sourceid_idx (cost=0.00..86.59 rows=3461 width=0) (actual time=6.749..6.749 rows=18555 loops=1)'
' Index Cond: ((sourceid)::text = 'xxx'::text)'
'Total runtime: 344.464 ms'