Re: How does PG know if data is in memory?

Robert Haas <robertmhaas@xxxxxxxxx> · Mon, 11 Oct 2010 22:59:28 -0400

On Mon, Oct 4, 2010 at 6:47 PM, Jeremy Harris <jgh@xxxxxxxxxxx> wrote:
> On 10/04/2010 04:22 AM, Greg Smith wrote:
>>
>> I had a brain-storming session on this subject with a few of the hackers
>> in the community in this area a while back I haven't had a chance to do
>> something with yet (it exists only as a pile of scribbled notes so far).
>> There's a couple of ways to collect data on what's in the database and OS
>> cache, and a couple of ways to then expose that data to the optimizer. But
>> that needs to be done very carefully, almost certainly as only a manual
>> process at first, because something that's producing cache feedback all of
>> the time will cause plans to change all the time, too. Where I suspect this
>> is going is that we may end up tracking various statistics over time, then
>> periodically providing a way to export a mass of "typical % cached" data
>> back to the optimizer for use in plan cost estimation purposes. But the idea
>> of monitoring continuously and always planning based on the most recent data
>> available has some stability issues, both from a "too many unpredictable
>> plan changes" and a "ba
>
> d
>>
>> short-term feedback loop" perspective, as mentioned by Tom and Kevin
>> already.
>
> Why not monitor the distribution of response times, rather than "cached" vs.
> not?
>
> That a) avoids the issue of discovering what was a cache hit  b) deals
> neatly with
> multilevel caching  c) feeds directly into cost estimation.

I was hot on doing better cache modeling a year or two ago, but the
elephant in the room is that it's unclear that it solves any
real-world problem.  The OP is clearly having a problem, but there's
not enough information in his post to say what is actually causing it,
and it's probably not caching effects.  We get occasional complaints
of the form "the first time I run this query it's slow, and then after
that it's fast" but, as Craig Ringer pointed out upthread, not too
many.  And even with respect to the complaints we do get, it's far
from clear that the cure is any better than the disease.  Taking
caching effects into account could easily result in the first
execution being slightly less slow and all of the subsequent
executions being moderately slow.  That would not be an improvement
for most people.  The reports that seem really painful to me are the
ones where people with really big machines complain of needing HOURS
for the cache to warm up, and having the system bogged down to a
standstill until then.  But changing the cost model isn't going to
help them either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance