>-----Original Message----- >From: pgsql-general-owner@xxxxxxxxxxxxxx >[mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of Gregory Stark >Sent: maandag 22 januari 2007 19:41 >To: Bruce Momjian >Cc: Heikki Linnakangas; Russell Smith; Darcy Buskermolen; >Simon Riggs; Alvaro Herrera; Matthew T. O'Connor; Pavan >Deolasee; Christopher Browne; pgsql-general@xxxxxxxxxxxxxx; >pgsql-hackers@xxxxxxxxxxxxxx >Subject: Re: [HACKERS] [GENERAL] Autovacuum Improvements > >"Bruce Momjian" <bruce@xxxxxxxxxx> writes: > >> Yep, agreed on the random I/O issue. The larger question is if you >> have a huge table, do you care to reclaim 3% of the table >size, rather >> than just vacuum it when it gets to 10% dirty? I realize the vacuum >> is going to take a lot of time, but vacuuming to relaim 3% >three times >> seems like it is going to be more expensive than just vacuuming the >> 10% once. And vacuuming to reclaim 1% ten times seems even more >> expensive. The partial vacuum idea is starting to look like >a loser to me again. > >Well the answer is of course "that depends". > >If you maintain the dead space at a steady state averaging >1.5% instead of 5% your table is 3.33% smaller on average. If >this is a DSS system that will translate into running your >queries 3.33% faster. It will take a lot of vacuums before >they hurt more than a 3%+ performance drop. Good, this means a DSS system will mostly do table scans (right?). So probably you should witness the 'table scan' statistic and rows fetched aproaching the end of the universe (at least compared to inserts/updates/deletes)? >If it's an OLTP system the it's harder to figure. a 3.33% >increase in data density will translate to a higher cache hit >rate but how much higher depends on a lot of factors. In our >experiments we actually got bigger boost in these kinds of >situations than the I expected (I expected comparable to the >3.33% improvement). So it could be even more than 3.33%. But >like said it depends. >If you already have the whole database cache you won't see any >improvement. If you are right on the cusp you could see a huge benefit. These tables have high insert, update and delete rates, probably a lot of index scans? I believe the workload on table scans should be (close to) none. Are you willing to share some of this measured data? I'm quite interested in such figures. >It sounds like you're underestimating the performance drain >10% wasted space has. If we found out that one routine was >unnecessarily taking 10% of the cpu time it would be an >obvious focus of attention. 10% wasted space is going to work >out to about 10% of the i/o time. > >It also sounds like we're still focused on the performance >impact in absolute terms. I'm much more interested in changing >the performance characteristics so they're predictable and >scalable. It doesn't matter much if your 1kb table is 100% >slower than necessary but it does matter if your 1TB table >needs 1,000x as much vacuuming as your 1GB table even if it's >getting the same update traffic. Or rather, the vacuuming should pay back. A nice metric might be: cost_of_not_vacuuming / cost_of_vacuuming. Obviously, the higher the better. - Joris Dobbelsteen