On Mon, Oct 4, 2010 at 6:47 PM, Jeremy Harris <jgh@xxxxxxxxxxx> wrote: > On 10/04/2010 04:22 AM, Greg Smith wrote: >> >> I had a brain-storming session on this subject with a few of the hackers >> in the community in this area a while back I haven't had a chance to do >> something with yet (it exists only as a pile of scribbled notes so far). >> There's a couple of ways to collect data on what's in the database and OS >> cache, and a couple of ways to then expose that data to the optimizer. But >> that needs to be done very carefully, almost certainly as only a manual >> process at first, because something that's producing cache feedback all of >> the time will cause plans to change all the time, too. Where I suspect this >> is going is that we may end up tracking various statistics over time, then >> periodically providing a way to export a mass of "typical % cached" data >> back to the optimizer for use in plan cost estimation purposes. But the idea >> of monitoring continuously and always planning based on the most recent data >> available has some stability issues, both from a "too many unpredictable >> plan changes" and a "ba > > d >> >> short-term feedback loop" perspective, as mentioned by Tom and Kevin >> already. > > Why not monitor the distribution of response times, rather than "cached" vs. > not? > > That a) avoids the issue of discovering what was a cache hit b) deals > neatly with > multilevel caching c) feeds directly into cost estimation. I was hot on doing better cache modeling a year or two ago, but the elephant in the room is that it's unclear that it solves any real-world problem. The OP is clearly having a problem, but there's not enough information in his post to say what is actually causing it, and it's probably not caching effects. We get occasional complaints of the form "the first time I run this query it's slow, and then after that it's fast" but, as Craig Ringer pointed out upthread, not too many. And even with respect to the complaints we do get, it's far from clear that the cure is any better than the disease. Taking caching effects into account could easily result in the first execution being slightly less slow and all of the subsequent executions being moderately slow. That would not be an improvement for most people. The reports that seem really painful to me are the ones where people with really big machines complain of needing HOURS for the cache to warm up, and having the system bogged down to a standstill until then. But changing the cost model isn't going to help them either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance