Re: lru_multiplier and backend page write-outs

Greg Smith <gsmith@xxxxxxxxxxxxx> · Thu, 6 Nov 2008 17:10:29 -0500 (EST)

On Thu, 6 Nov 2008, Peter Schuller wrote:

In order to keep it from using up the whole cache with maintenance
overhead, vacuum allocates a 256K ring of buffers and use re-uses ones
from there whenever possible.

no table was ever large enough that 256k buffers would ever be filled by 
the process of vacuuming a single table.

Not 256K buffers--256K, 32 buffers.

In addition, when I say "constantly" above I mean that the count
increases even between successive SELECT:s (of the stat table) with
only a second or two in between.

Writes to the database when only doing read operations are usually related 
to hint bits:  http://wiki.postgresql.org/wiki/Hint_Bits

On this topic btw, was it considered to allow the administrator to
specify a fixed-size margin to use when applying the JIT policy?

Right now, there's no way to know exactly what's in the buffer cache 
without scanning the individual buffers, which requires locking their 
headers so you can see them consistently.  No one process can get the big 
picture without doing something intrusive like that, and on a busy system 
the overhead of collecting more data to know how exactly far ahead the 
cleaning is can drag down overall performance.  A lot can happen while the 
background writer is sleeping.

One next-generation design which has been sketched out but not even 
prototyped would take cleaned buffers and add them to the internal list of 
buffers that are free, which right now is usually empty on the theory that 
cached data is always more useful than a reserved buffer.  If you 
developed a reasonable model for how many buffers you needed and padded 
that appropriately, that's the easiest way (given the rest of the buffer 
manager code) to get close to ensuring there aren't any backend writes. 
Because you've got the OS buffering writes anyway in most cases, it's hard 
to pin down whether that actually improved worst-case latency though. And 
moving in that direction always seems to reduce average throughput even in 
write-heavy benchmarks.

The important thing to remember is that the underlying OS has its own read 
and write caching mechanisms here, and unless the PostgreSQL ones are 
measurably better than those you might as well let the OS manage the 
problem instead.  It's easy to demonstrate that's happening when you give 
a decent amount of memory to shared_buffers, it's much harder to prove 
that's the case for an improved write scheduling algorithm.  Stepping back 
a bit, you might even consider that one reason PostgreSQL has grown as 
well as it has in scalability is exactly because it's been riding 
improvements the underlying OS in many of these cases, rather than trying 
to do all the I/O scheduling itself.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance