2010/10/4 Greg Smith <greg@xxxxxxxxxxxxxxx>: > Craig Ringer wrote: >> >> If some kind of cache awareness was to be added, I'd be interested in >> seeing a "hotness" measure that tracked how heavily a given relation/index >> has been accessed and how much has been read from it recently. A sort of >> age-scaled blocks-per-second measure that includes both cached and uncached >> (disk) reads. This would let the planner know how likely parts of a given >> index/relation are to be cached in memory without imposing the cost of >> tracking the cache in detail. I'm still not sure it'd be all that useful, >> though... > > Yup, that's one of the design ideas scribbled in my notes, as is the idea of > what someone dubbed a "heat map" that tracked which parts of the relation > where actually the ones in RAM, the other issue you mentioned. The problem > facing a lot of development possibilities in this area is that we don't have > any continuous benchmarking of complicated plans going on right now. So if > something really innovative is done, there's really no automatic way to test > the result and then see what types of plans it improves and what it makes > worse. Until there's some better performance regression work like that > around, development on the optimizer has to favor being very conservative. * tracking specific block is not very easy because of readahead. You end-up measuring exactly if a block was in memory at the moment you requested it physicaly, not at the moment the first seek/fread happen. It is still interesting stat imho. I wonder how that can add value to the planner. * If the planner knows more about the OS cache it can guess the effective_cache_size on its own, which is probably already nice to have. Extract from postgres code: * We use an approximation proposed by Mackert and Lohman, "Index Scans * Using a Finite LRU Buffer: A Validated I/O Model", ACM Transactions * on Database Systems, Vol. 14, No. 3, September 1989, Pages 401-424. Planner use that in conjunction with effective_cache_size to guess if it is interesting to scan the index. All is to know if this model is still valid in front of a more precise knowledge of the OS page cache... and also if it matches how different systems like windows and linux handle page cache. Hooks around cost estimation should help writing a module to rethink that part of the planner and make it use the statistics about cache. I wonder if adding such hooks to core impact its performances ? Anyway doing that is probably the easier and shorter way to test the behavior. > > -- > Greg Smith, 2ndQuadrant US greg@xxxxxxxxxxxxxxx Baltimore, MD > PostgreSQL Training, Services and Support www.2ndQuadrant.us > Author, "PostgreSQL 9.0 High Performance" Pre-ordering at: > https://www.packtpub.com/postgresql-9-0-high-performance/book > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ ; PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance