Re: How does PG know if data is in memory?

Cédric Villemain <cedric.villemain.debian@xxxxxxxxx> · Mon, 4 Oct 2010 22:39:40 +0200

2010/10/4 Greg Smith <greg@xxxxxxxxxxxxxxx>:
> Craig Ringer wrote:
>>
>> If some kind of cache awareness was to be added, I'd be interested in
>> seeing a "hotness" measure that tracked how heavily a given relation/index
>> has been accessed and how much has been read from it recently. A sort of
>> age-scaled blocks-per-second measure that includes both cached and uncached
>> (disk) reads. This would let the planner know how likely parts of a given
>> index/relation are to be cached in memory without imposing the cost of
>> tracking the cache in detail. I'm still not sure it'd be all that useful,
>> though...
>
> Yup, that's one of the design ideas scribbled in my notes, as is the idea of
> what someone dubbed a "heat map" that tracked which parts of the relation
> where actually the ones in RAM, the other issue you mentioned.  The problem
> facing a lot of development possibilities in this area is that we don't have
> any continuous benchmarking of complicated plans going on right now.  So if
> something really innovative is done, there's really no automatic way to test
> the result and then see what types of plans it improves and what it makes
> worse.  Until there's some better performance regression work like that
> around, development on the optimizer has to favor being very conservative.

* tracking specific block is not very easy because of readahead. You
end-up measuring exactly if a block was in memory at the moment you
requested it physicaly, not at the moment the first seek/fread happen.
It is still interesting stat imho.

I wonder how that can add value to the planner.

* If the planner knows more about the OS cache it can guess the
effective_cache_size on its own, which is probably already nice to
have.

Extract from postgres code:
 * We use an approximation proposed by Mackert and Lohman, "Index Scans
 * Using a Finite LRU Buffer: A Validated I/O Model", ACM Transactions
 * on Database Systems, Vol. 14, No. 3, September 1989, Pages 401-424.

Planner use that in conjunction with effective_cache_size to guess if
it is interesting to scan the index.
All is to know if this model is still valid in front of a more precise
knowledge of the OS page cache... and also if it matches how different
systems like windows and linux handle page cache.

Hooks around cost estimation should help writing a module to rethink
that part of the planner and make it use the statistics about cache. I
wonder if adding such hooks to core impact its  performances ? Anyway
doing that is probably the easier and shorter way to test the
behavior.

>
> --
> Greg Smith, 2ndQuadrant US greg@xxxxxxxxxxxxxxx Baltimore, MD
> PostgreSQL Training, Services and Support  www.2ndQuadrant.us
> Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
> https://www.packtpub.com/postgresql-9-0-high-performance/book
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/ ;    PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance