Robert Haas <robertmhaas@xxxxxxxxx> writes: > Yeah. For Kevin's case, it seems like we want the caching percentage > to vary not so much based on which table we're hitting at the moment > but on how much of it we're actually reading. Well, we could certainly take the expected number of pages to read and compare that to effective_cache_size. The thing that's missing in that equation is how much other stuff is competing for cache space. I've tried to avoid having the planner need to know the total size of the database cluster, but it's kind of hard to avoid that if you want to model this honestly. Would it be at all workable to have an estimate that so many megs of a table are in cache (independently of any other table), and then we could scale the cost based on the expected number of pages to read versus that number? The trick here is that DBAs really aren't going to want to set such a per-table number (at least, most of the time) so we need a formula to get to a default estimate for that number based on some simple system-wide parameters. I'm not sure if that's any easier. BTW, it seems that all these variants have an implicit assumption that if you're reading a small part of the table it's probably part of the working set; which is an assumption that could be 100% wrong. I don't see a way around it without trying to characterize the data access at an unworkably fine level, though. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance