At 10:53 AM 2/26/2007, Peter Childs wrote:
On 26/02/07, Pallav Kalva <pkalva@xxxxxxxxxxxxxxxxx> wrote:
Hi,
I am in the process of cleaning up one of our big table, this table
has 187 million records and we need to delete around 100 million of them.
I am deleting around 4-5 million of them daily in order to catchup
with vacuum and also with the archive logs space. So far I have deleted
around 15million in past few days.
max_fsm_pages value is set to 1200000. Vacuumdb runs once daily,
here is the output from last night's vacuum job
=======================================================================================
INFO: free space map: 999 relations, 798572 pages stored; 755424
total pages needed
DETAIL: Allocated FSM size: 1000 relations + 1200000 pages = 7096
kB shared memory.
VACUUM
========================================================================================
From the output it says 755424 total pages needed , this number
keeps growing daily even after vacuums are done daily. This was around
350K pages before the delete process started.
I am afraid that this number will reach the max_fsm_pages limit
soon and vacuums thereafter will never catch up .
Can anyone please explain this behavior ? What should I do to catch
up with vacuumdb daily ?
Vacuum adds to free pages to the fsm so that they can be reused. If
you don't fill up those free pages the fsm will fill up. Once the fsm
is full no more pages can be added to the fsm. If you start writing to
the free pages via inserts when vacuum next runs more free pages will
be added that did not fit previously in the free space map due to it
being full.
If you are really deleting that many records you may be better coping
those you want to a new table and dropping the old one. To actually
recover space you need to either run vacuum full or cluster.
This ought to be in the manual somewhere as this question gets asked
about once a week.
Peter.
In fact ,
a= copying data to a new table and dropping the original table
rather than
b= updating the original table
is a "standard best DBA practice" regardless of DB product.
The only thing that changes from DB product to DB product is the
exact point where the copy is large enough to make "copy, replace"
better than "update in place".
Rule of Thumb: No matter what DB product you are using, if it's more
than 1/2 of any table or more than 1/4 of any table that does not fit
into memory, it's usually better to copy replace rather then update in place.
...and I completely agree that we should document this sort of
Industry Best Practice in a way that is easily usable by the pg community.
Cheers,
Ron