Re: [HACKERS] Autovacuum Improvements

Bruce Momjian <bruce@xxxxxxxxxx> · Mon, 22 Jan 2007 13:27:19 -0500 (EST)

Yep, agreed on the random I/O issue.  The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once.  And
vacuuming to reclaim 1% ten times seems even more expensive.  The
partial vacuum idea is starting to look like a loser to me again.

---------------------------------------------------------------------------

Gregory Stark wrote:
> "Bruce Momjian" <bruce@xxxxxxxxxx> writes:
> 
> > I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
> > out how small, and it seems about 15% of the total table, which means if
> > we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
> > 80%, assuming 5% of the table is scanned.
> 
> Actually no. A while back I did experiments to see how fast reading a file
> sequentially was compared to reading the same file sequentially but skipping
> x% of the blocks randomly. The results were surprising (to me) and depressing.
> The breakeven point was about 7%.
> 
> That is, if you assum that only 5% of the table will be scanned and you
> arrange to do it sequentially then you should expect the i/o to be marginally
> faster than just reading the entire table. Vacuum does do some cpu work and
> wouldn't have to consult the clog as often, so it would still be somewhat
> faster.
> 
> The theory online was that as long as you're reading one page from each disk
> track you're going to pay the same seek overhead as reading the entire track.
> I also had some theories involving linux being confused by the seeks and
> turning off read-ahead but I could never prove them.
> 
> In short, to see big benefits you would have to have a much smaller percentage
> of the table being read. That shouldn't be taken to mean that the DSM is a
> loser. There are plenty of use cases where tables can be extremely large and
> have only very small percentages that are busy. The big advantage of the DSM
> is that it takes the size of the table out of the equation and replaces it
> with the size of the busy portion of the table. So updating a single record in
> a terabyte table has the same costs as updating a single record in a kilobyte
> table.
> 
> Sadly that's not quite true due to indexes, and due to the size of the bitmap
> itself. But going back to your numbers it does mean that if you update a
> single row out of a terabyte table then we'll be removing about 85% of the i/o
> (minus the i/o needed to read the DSM, about .025%). If you update about 1%
> then you would be removing substantially less, and once you get to about 10%
> then you're back where you started.
> 
> 
> -- 
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian   bruce@xxxxxxxxxx
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +