Re: How does PG know if data is in memory?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Fabrício dos Anjos Silva wrote:
After reading lots of documentation, I still don't understand fully how PG knows if some needed data is in memory or in second storage. While choosing the best query plan, the optimizer must take this into account. Does PG consider this? If so, how does it know?

There are actually two different questions here, and I'm not sure both have been completely clarified for you by the discussion yet.

PostgreSQL has its own dedicated pool of memory, sized by shared_buffers. When you request a page, if it's already in memory, you get pointed toward that copy and no physical I/O happens. If it's not, PostgreSQL asks the OS to read the page. It's possible that will return a page that's in the OS cache; the database currently has no idea when this does or doesn't happen though. The hit % counters in the database only reflect shared_buffers hits, not OS ones. Some work to integrate the OS cache information into the database has been done, the current leading project in that area is pgfincore: http://pgfoundry.org/projects/pgfincore/

However, none of this information is considered at all by the query optimizer. It makes plans without any knowledge of what is or isn't in RAM right now, either the dedicated database memory or the OS cache. Only the ratios of the planner constants are really considered. You can set those on a query by query basis to provide subtle hints when you know something the planner doesn't, but you have to be very careful about doing that as those plans tend to get obsolete eventually when you do that trick.

I had a brain-storming session on this subject with a few of the hackers in the community in this area a while back I haven't had a chance to do something with yet (it exists only as a pile of scribbled notes so far). There's a couple of ways to collect data on what's in the database and OS cache, and a couple of ways to then expose that data to the optimizer. But that needs to be done very carefully, almost certainly as only a manual process at first, because something that's producing cache feedback all of the time will cause plans to change all the time, too. Where I suspect this is going is that we may end up tracking various statistics over time, then periodically providing a way to export a mass of "typical % cached" data back to the optimizer for use in plan cost estimation purposes. But the idea of monitoring continuously and always planning based on the most recent data available has some stability issues, both from a "too many unpredictable plan changes" and a "bad short-term feedback loop" perspective, as mentioned by Tom and Kevin already.

--
Greg Smith, 2ndQuadrant US greg@xxxxxxxxxxxxxxx Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux