Re: Limit of bgwriter_lru_maxpages of max. 1000?

Greg Smith <gsmith@xxxxxxxxxxxxx> · Mon, 5 Oct 2009 16:40:44 -0400 (EDT)

On Mon, 5 Oct 2009, Gerhard Wiesinger wrote:

I think the problem is, that it is done on checkpoint time (whether 
spread or not). I should have been already be done by bgwriter.

This is pretty simple:  if you write things before checkpoint time, you'll 
end up re-writing a percentage of the blocks if they're re-dirtied before 
the checkpoint actually happens.  The checkpoint itself is always the most 
efficient time to write something out.  People think that the background 
writer should do more, but it can't without generating more writes than if 
you instead focused on spreading the checkpoints out instead.  This is why 
the only work the BGW does try to do is writing out blocks that it's 
pretty sure are going to be evicted very soon (in the next 200ms, or 
whatever its cycle time is set to), to minimize the potential for 
mistakes.  The design errors a bit on the side of doing too little because 
it is paranoid about not doing wasted work, and that implementation always 
beat one where the background writer was more aggressive in benchmarks.

This is hard for people to accept, but there were three of us running 
independent tests to improve things here by the end of 8.3 development and 
everybody saw similar results as far as the checkpoint spreading approach 
being the right one.  At the time the patch was labeled "load distributed 
checkpoint" and if I had more time today I'd try and find the more 
interesting parts of that discussion to highlight them.

BTW: Is it possible to get everything in pg_class over all databases as 
admin?

Scott's message at 
http://archives.postgresql.org/pgsql-general/2009-09/msg00986.php 
summarizes the problem nicely, and I suggested my workaround for it at 
http://archives.postgresql.org/pgsql-general/2009-09/msg00984.php

Bug2: Double iteration of buffers
As you can seen in the calling tree below there is double iteration with 
buffers involved. This might be a major performance bottleneck.

Hmmm, this might be a real bug causing scans through the buffer cache to go 
twice as fast as intended.

That's not twice O(2*n)=O(n) that's a factor n*n (outer and inner loop 
iteration) which means overall is O(n^2) which is IHMO too much.

I follow what you mean, didn't notice that.  SyncOneBuffer isn't a O(n) 
operation; it's O(1).  So I'd think that the potential bug here turns into 
a O(n) issue then given it's the routine being called n times.

This seems like a job for "dump things to the log file" style debugging. 
If I can reproduce an actual bug here it sounds like a topic for the 
hackers list outside of this discussion.

The problem might be hidden for the following reasons:
1.) Buffers values are too low that even n^2 is low for today's machines
2.) Code is not often called in that way
3.) backend writes out pages so that the code is never executed

(2) was the reason I figured it might have escaped notice.  It's really 
not called that often in a way that would run into the problem you think 
is there.

Do you have an where one should set tracepoints inside and outside 
PostgreSQL?

I think you'd want to instrument BufferAlloc inside bufmgr.c to measure 
what you're after.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general