As a follow-up, here's a portion of the nightly vacuum's logs, just want to confirm if my conclusions are right:
INFO: vacuuming "pg_catalog.pg_attribute"
INFO: vacuuming "pg_catalog.pg_attribute"
INFO: scanned index "pg_attribute_relid_attnam_index" to remove 3014172 row versions
DETAIL: CPU 0.20s/1.08u sec elapsed 3.72 sec.
INFO: scanned index "pg_attribute_relid_attnum_index" to remove 3014172 row versions
DETAIL: CPU 0.14s/0.89u sec elapsed 1.70 sec.
INFO: "pg_attribute": removed 3014172 row versions in 52768 pages
DETAIL: CPU 0.31s/0.30u sec elapsed 1.15 sec.
INFO: index "pg_attribute_relid_attnam_index" now contains 19578 row versions in 45817 pages
DETAIL: 3013689 index row versions were removed.
45668 index pages have been deleted, 34116 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "pg_attribute_relid_attnum_index" now contains 19578 row versions in 32554 pages
DETAIL: 3010630 index row versions were removed.
32462 index pages have been deleted, 24239 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_attribute": found 2278389 removable, 17319 nonremovable row versions in 52856 out of 57409 pages
DETAIL: 298 dead row versions cannot be removed yet.
There were 641330 unused item pointers.
0 pages are entirely empty.
CPU 1.44s/2.88u sec elapsed 10.55 sec.
INFO: "pg_attribute": stopping truncate due to conflicting lock request
INFO: analyzing "pg_catalog.pg_attribute"
INFO: "pg_attribute": scanned 30000 of 57409 pages, containing 10301 live rows and 0 dead rows; 10301 rows in sample, 24472 estimated total rows
So,
- About 3 million rows churned in the table and its two indexes (i.e. dead rows which vacuum found and marked) - that's per day since this is a nightly operation.
- After the vacuum, the indexes are left with 19578 rows in (for the first one) in 45817 pages. That's a lot of empty pages, which should be reused the next day, together with free space in partially filled tables, right?
- Since pages are 8 KiB, 46,000 pages is about 360 MiB of space - that's how large just one of the pg_attribute table's indexes is after the vacuum. Altogether, it's more than 1 GB.
- The "stopping truncate..." message is not really significant, it would have shortened the data files from the end if there are empty pages at the end, which in this case there isn't a significant number of. The truncation can probably never happen on system tables like these since they are always used...?
The real question is: why is the total size (i.e. the number of pages) growing at an alarming rate? On one of the db's, we're seeing almost doubling in size each week. Is the internal fragmentation of the data files so significant?
Ok, a couple more questions:
- How come "0 pages are entirely empty" if there are 17319 rows spread around in 52856 pages?
- What are "unused item pointers"?
(I agree with your previous suggestions, will see if they can be implemented).
On 28 January 2016 at 00:13, Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> wrote:
On Wed, 27 Jan 2016 23:54:37 +0100
Ivan Voras <ivoras@xxxxxxxxx> wrote:
>
> I've done my Googling, and it looks like this is a fairly common problem.
> In my case, there's a collection of hundreds of databases (10 GB+) with
> apps which are pretty much designed (a long time ago) with heavy use of
> temp tables - so a non-trivial system.
>
> The databases are vacuumed (not-full) daily, from cron
Vacuuming once a day is seldom often enough, except on very quiet
databases.
> (autovacuum was
> turned off some time ago for performance reasons), and still their size
> increases unexpectedly. By using some of the queries floating around on the
> wiki and stackoverflow[*], I've discovered that the bloat is not, as was
> assumed, in the user tables, but in the system tables, mostly in
> pg_attributes and pg_class.
The size increase isn't really unexpected. If you're only vacuuming once
per day, it's very easy for activity to cause active tables to bloat quite
a bit.
> This is becoming a serious problem, as I've seen instances of these tables
> grow to 6 GB+ (on a 15 GB total database), while still effectively
> containing on the order of 10.000 records or so. This is quite abnormal.
>
> For blocking reasons, we'd like to avoid vacuum fulls on these tables (as
> it seems like touching them will lock up everything else).
It will. But to get them back down to a reasonable size, you're going to
have to do a VACUUM FULL at least _once_. If you retune things correctly,
you shouldn't need any more FULLs after that 1 time.
> So, question #1: WTF? How could this happen, on a regularly vacuumed
> system? Shouldn't the space be reused, at least after a VACUUM? The issue
> here is not the absolute existence of the bloat space, it's that it's
> constantly growing for *system* tables.
With a lot of activity, once a day probably isn't regular enough.
> Question #2: What can be done about it?
I highly recommend turning autovacuum back on, then tweaking the autovacuum
parameters to prevent any preformance issues.
However, if you're dead set against autovacuum, find out (using the queries
that are available all over the internet) which tables are bloating the
worst, and schedule additional vacuums via cron that vacuum _only_ the
problem tables. How often is something that will require some guesswork
and/or experimenting, but I would recommend at least once per hour. Since
you're only vacuuming selected tables, the performance impact should be
minimal.
You'll have to do a VACUUM FULL on the bloated tables _once_ to get the size
back down, but if you pick a good schedule or use autovacuum with appropriate
settings, they shouldn't need a VACUUM FULL again after that.
--
Bill Moran