Yep, agreed on the random I/O issue. The larger question is if you have a huge table, do you care to reclaim 3% of the table size, rather than just vacuum it when it gets to 10% dirty? I realize the vacuum is going to take a lot of time, but vacuuming to relaim 3% three times seems like it is going to be more expensive than just vacuuming the 10% once. And vacuuming to reclaim 1% ten times seems even more expensive. The partial vacuum idea is starting to look like a loser to me again. --------------------------------------------------------------------------- Gregory Stark wrote: > "Bruce Momjian" <bruce@xxxxxxxxxx> writes: > > > I agree it index cleanup isn't > 50% of vacuum. I was trying to figure > > out how small, and it seems about 15% of the total table, which means if > > we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps > > 80%, assuming 5% of the table is scanned. > > Actually no. A while back I did experiments to see how fast reading a file > sequentially was compared to reading the same file sequentially but skipping > x% of the blocks randomly. The results were surprising (to me) and depressing. > The breakeven point was about 7%. > > That is, if you assum that only 5% of the table will be scanned and you > arrange to do it sequentially then you should expect the i/o to be marginally > faster than just reading the entire table. Vacuum does do some cpu work and > wouldn't have to consult the clog as often, so it would still be somewhat > faster. > > The theory online was that as long as you're reading one page from each disk > track you're going to pay the same seek overhead as reading the entire track. > I also had some theories involving linux being confused by the seeks and > turning off read-ahead but I could never prove them. > > In short, to see big benefits you would have to have a much smaller percentage > of the table being read. That shouldn't be taken to mean that the DSM is a > loser. There are plenty of use cases where tables can be extremely large and > have only very small percentages that are busy. The big advantage of the DSM > is that it takes the size of the table out of the equation and replaces it > with the size of the busy portion of the table. So updating a single record in > a terabyte table has the same costs as updating a single record in a kilobyte > table. > > Sadly that's not quite true due to indexes, and due to the size of the bitmap > itself. But going back to your numbers it does mean that if you update a > single row out of a terabyte table then we'll be removing about 85% of the i/o > (minus the i/o needed to read the DSM, about .025%). If you update about 1% > then you would be removing substantially less, and once you get to about 10% > then you're back where you started. > > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian bruce@xxxxxxxxxx EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +