Re: lru_multiplier and backend page write-outs

Peter Schuller <peter.schuller@xxxxxxxxxxxx> · Thu, 6 Nov 2008 11:19:18 +0100

Hello,

> At one point I envisioned making it smart enough to try and handle the 
> scenario you describe--on an idle system, you may very well want to write 
> out dirty and recently accessed buffers if there's nothing else going on. 
> But such behavior is counter-productive on a busy system, which is why a 
> similar mechanism that existed before 8.3 was removed.  Making that only 
> happen when idle requires a metric for what "busy" means, which is tricky 
> to do given the information available to this particular process.
> 
> Short version:  if you never fill the buffer cache, buffers_clean will 
> always be zero, and you'll only see writes by checkpoints and things not 
> operating with the standard client buffer allocation mechanism.  Which 
> brings us to...

Sure. I am not really out to get the background writer to
pre-emptively do "idle trickling". Though I can see cases where one
might care about this (such as lessening the impact of OS buffer cache
delays on checkpoints), it's not what I am after now.

> > One theory: Is it the auto vacuum process? Stracing those I've seen
> > that they very often to writes directly to disk.
> 
> In order to keep it from using up the whole cache with maintenance 
> overhead, vacuum allocates a 256K ring of buffers and use re-uses ones 
> from there whenever possible.  That will generate buffer_backend writes 
> when that ring fills but it has more left to scan.  Your theory that all 
> the backend writes are coming from vacuum seems consistant with what 
> you've described.

The bit that is inconsistent with this theory, given the above ring
buffer desription, is that I saw the backend write-out count
increasing constantlyduring the write activity I was generating to the
database. However (because in this particular case it was a small
database used for some latency related testing), no table was ever
large enough that 256k buffers would ever be filled by the process of
vacuuming a single table. Most tables would likely have been a handful
to a couple of hundred of pages large.

In addition, when I say "constantly" above I mean that the count
increases even between successive SELECT:s (of the stat table) with
only a second or two in between. In the abscence of long-running
vacuum's, that discounts vacuuming because the naptime is 1 minute.

In fact this already discounted vacuuming even without the added
information you provided above, but I didn't realize when originally
posting.

The reason I mentioned vacuuming was that the use case is such that we
do have a lot of tables constantly getting writes and updates, but
they are all small.

Anything else known that might be generating the writes, if it is not
vacuuming?

> You might even want to drop the two background writer parameters you've 
> tweaked upwards back down closer to their original values.  I get the 
> impression you might have increased those hoping for more background 
> writer work because you weren't seeing any.  If you ever do get to where 
> your buffer cache is full and the background writer starts doing 
> something, those could jump from ineffective to wastefully heavy at that 
> point.

I tweaked it in order to eliminate backends having to do
"synchrounous" (with respect to the operating system even if not with
respect to the underlying device) writes.

The idea is that writes to the operating system are less
understood/controlled, in terms of any latency they may case. It would
be very nice if the backend writes were always zero under normal
circumstances (or at least growing very very rarely in edge cases
where the JIT policy did not suceed), in order to make it a more
relevant and rare observation that the backend write-outs are
systematically increasing.

On this topic btw, was it considered to allow the administrator to
specify a fixed-size margin to use when applying the JIT policy? (The
JIT policy and logic itself being exactly the same still.)

Especially with larger buffer caches, that would perhaps allow the
administrator to make a call to truly eliminate synchronous writes
during normal operation, while not adversely affecting anything (if
the buffer cache is 1 GB, having a margin of say 50 MB does not really
matter much in terms of wasting memory, yet could have a significant
impact on eliminating synchronous write-outs).

On a system where you really want to keep backend writes to exactly 0
under normal circumstances (discounting vacuuming), and having a large
buffer cache (say the one gig), it might be nice to be able to say "ok
- I have 1 GB of buffer cache. for the purpose of the JIT algorithm,
please pretend it's only 900 MB". The result is 100 MB of constantly
sized "margin", with respect to ensuring writes are asynchronous.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@xxxxxxxxxxxx>'
Key retrieval: Send an E-Mail to getpgpkey@xxxxxxxxx
E-Mail: peter.schuller@xxxxxxxxxxxx Web: http://www.scode.org

Attachment:
pgpvyRq0E3Pgh.pgp

Description: PGP signature