Re: VACUUM FULL hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Marcin Mańk wrote:
Hello list.

I recently tried to do a slony replica of my database, and doing it falied.
I retried, and then it succeeded (why it failed is another story).

This caused that in the replica there is a lot of dead tuples ( If i
understand correctly, a failure in creating the replica means a HUGE aborted
transaction - and Slony should TRUNCATE the table, getting rid of dead
tuples, but that is a subject for another list).

so I did vacuum full verbose analyze (does it make sense ?)

Fair enough. If you want empty tables TRUNCATE is probably a better bet though.

This hanged on a (quite large) table:

INFO:  vacuuming "public.calls"
INFO:  "calls": found 7980456 removable, 3989705 nonremovable row versions
in 296943 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 136 to 224 bytes long.
There were 891 unused item pointers.
Total free space (including removable row versions) is 1594703944 bytes.
197958 pages are or will become empty, including 0 at the end of the table.
212719 pages containing 1588415680 free bytes are potential move
destinations.

OK, so there are 7.9 million removable rows and 3.9 million nonremovable so truncate isn't an option since you have data you presumably want to keep. It estimates about 200,000 pages will become empty, but none of them are at the end of the table. This represents 1.5GB of unused disk-space.

I'm a bit puzzled as to how you managed to get so much free space at the start of the table. Did the replication work on the second try?

CPU 7.25s/3.28u sec elapsed 144.95 sec.
INFO:  index "calls_pkey" now contains 3989705 row versions in 8975 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.35s/0.59u sec elapsed 39.03 sec.
INFO:  index "calls_cli" now contains 3989705 row versions in 13504 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.51s/0.60u sec elapsed 58.60 sec.
INFO:  index "calls_dnis" now contains 3989705 row versions in 13600 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.60s/0.90u sec elapsed 27.05 sec.
INFO:  index "calls_u" now contains 3989705 row versions in 23820 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.92s/0.78u sec elapsed 80.51 sec.
INFO:  index "calls_z" now contains 3989705 row versions in 13607 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.60s/0.85u sec elapsed 39.77 sec.

It's done all the indexes (and seems to have done them quite quickly), and is presumably working on the data now.

It was hanging in this state for more than 3 hours, and I had to kill the
vacuum process.

From iostat I saw that there was continuous write activity, steadilly about
1.3 MB/s (the disk system can do about 40 MB/s), and there were iowait
processes. There was no read activity.

There were no other clients for that database (but there were clients in
other databases in the instance).

OK, so you might well be getting the vacuum writing one page, then WAL, then vacuum, etc. That will mean the disk spends most of its time seeking back and fore. How many disks do you have, and is the WAL on a separate set of disks?

I think it's just taking a long time because you have so many pages to move and not enough disk bandwidth. Of course the root of the problem is that you had so many dead rows after a failed replication, but you're right and that's another email.

--
  Richard Huxton
  Archonet Ltd



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux