Re: VACUUM FREEZE output more than double input

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> · Mon, 17 Dec 2007 09:45:43 -0600

>>> On Fri, Dec 14, 2007 at  5:59 PM, in message <15401.1197676791@xxxxxxxxxxxxx>,
Tom Lane <tgl@xxxxxxxxxxxxx> wrote: 
> "Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> writes:
>> Why double writes per read, plus massive writes at checkpoint?
> 
> The double writes aren't surprising: freezing has to be WAL-logged,
> and the odds are that each page hasn't been touched since the last
> checkpoint, so the WAL log will include a complete page image.
> So in the steady state where all shared buffers are dirty, the
> per-page cycle is:
> 	* write out a dirty buffer so it can be reclaimed
> 	* read in a page
> 	* modify it to mark tuples frozen
> 	* write an image of the page to WAL
> 	* leave the now-dirty page in shared buffers for later writing
> 
> The checkpoint spikes would come from trying to flush out all the
> dirty buffers at once.

Got it.  Thanks.

> You'd expect a bit of a valley after each peak, since the vacuum
> could presumably recycle some buffers without having to flush 'em
> first; but I don't see one in your data.  That may just be because
> the numbers are too noisy, but I kinda suspect that the vacuum is
> dirtying buffers nearly as fast as the bgwriter can clean them,
> leaving not a lot of daylight for a valley.

Yeah, the pattern was pretty consistent and without valleys.

> 8.3 should pretty well eliminate the checkpoint spike in this scenario,
> because vacuum will work in a limited number of shared buffers instead
> of dirtying the whole cache.  But you'll still see 2X writes over reads.

Testing 8.3beta4 so far has shown both smoother I/O and better
performance in all respects.  The preliminary post I did where I
thought I saw some regression on loading a pg_dump turned out to be
was an "apples to oranges" comparison; comparing the same load on
the same hardware and OS, 8.3 wins.  (Kudos to all who worked on
these improvements!)

> If this is data that you could re-generate at need, it might make sense
> to turn off full_page_writes during the initial data load and vacuum.

Thanks for the suggestions; I'll try that.

> I concur with trying to FREEZE all the data while you do this, else
> you'll see the same work done whenever the data happens to slip past
> the auto freeze threshold.

Thanks.  I thought that made sense, but I'm still trying to get my
head around some of the dynamics of PostgreSQL and MVCC.  I'll
suggest that as policy here.

-Kevin

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

                http://www.postgresql.org/about/donate