Re: How does PG know if data is in memory?

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> · Tue, 12 Oct 2010 09:35:56 -0500

<gnuoytr@xxxxxxx> wrote:

> An approach that works can be found in DB2, and likely elsewhere.
> 
> The key is that tablespaces/tables/indexes/buffers are all
> attached through the bufferpool (the DB2 term).  A tablespace/
> bufferpool match is defined.  Then tables and indexes are assigned
> to the tablespace (and implicitly, the bufferpool).  As a result,
> one can effectively pin data in memory. This is very useful, but
> not low hanging fruit to implement.

This sounds similar to Sybase named caches.  You can segment off
portions of the memory for specific caches, break that up into space
reserved for different I/O buffer sizes, and bind specific database
objects (tables and indexes) to specific caches.  On the few
occasions where someone had failed to configure the named caches
when setting up a machine, it was caught almost immediately after
deployment because of end-user complaints about poor performance. 
This was so critical to performance for us when we were using
Sybase, that one of my first reactions on finding it missing in
PostgreSQL was distress over the inability to tune as I had.

When I posted to the list about it, the response was that LRU
eviction was superior to any tuning any human would do.  I didn't
and don't believe that, but have found it's close enough in the
PostgreSQL environment to be *way* down my list of performance
issues.  In fact, when looking at the marginal benefits it would
generate in PostgreSQL when done right, versus the number of people
who would shoot themselves in the foot with it, even I have come
around to feeling it's probably not a good idea.

FWIW, the four main reasons for using it were:

(1) Heavily used data could be kept fully cached in RAM and not
driven out by transient activity.

(2) You could flag a cache used for (1) above as using "relaxed LRU
accounting" -- it saved a lot of time tracking repeated references,
leaving more CPU for other purposes.

(3) Each named cache had its own separate set of locks, reducing
contention.

(4) Large tables for which the heap was often were scanned in its
entirety or for a range on the clustered index could be put in a
relatively small cache with large I/O buffers.  This avoided blowing
out the default cache space for situations which almost always
required disk I/O anyway.

None of that is anything for amateurs to play with.  You need to set
up caches like that based on evidence from monitoring and do careful
benchmarking of the results to actually achieve improvements over
LRU logic.

> The introduction of rudimentary tablespaces is a first step.  I
> assumed that the point was to get to a DB2-like structure at some
> point.  Yes?

As far as I can tell, there is nobody with that intent.

-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance