2010/11/11 Robert Haas <robertmhaas@xxxxxxxxx>: > On Thu, Nov 11, 2010 at 2:35 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >> Robert Haas <robertmhaas@xxxxxxxxx> writes: >>> Yeah. For Kevin's case, it seems like we want the caching percentage >>> to vary not so much based on which table we're hitting at the moment >>> but on how much of it we're actually reading. >> >> Well, we could certainly take the expected number of pages to read and >> compare that to effective_cache_size. The thing that's missing in that >> equation is how much other stuff is competing for cache space. I've >> tried to avoid having the planner need to know the total size of the >> database cluster, but it's kind of hard to avoid that if you want to >> model this honestly. > > I'm not sure I agree with that. I mean, you could easily have a > database that is much larger than effective_cache_size, but only that > much of it is hot. Or, the hot portion could move around over time. > And for reasons of both technical complexity and plan stability, I > don't think we want to try to model that. It seems perfectly > reasonable to say that reading 25% of effective_cache_size will be > more expensive *per-page* than reading 5% of effective_cache_size, > independently of what the total cluster size is. > >> Would it be at all workable to have an estimate that so many megs of a >> table are in cache (independently of any other table), and then we could >> scale the cost based on the expected number of pages to read versus that >> number? The trick here is that DBAs really aren't going to want to set >> such a per-table number (at least, most of the time) so we need a >> formula to get to a default estimate for that number based on some simple >> system-wide parameters. I'm not sure if that's any easier. > > That's an interesting idea. For the sake of argument, suppose we > assume that a relation which is less than 5% of effective_cache_size > will be fully cached; and anything larger we'll assume that much of it > is cached. Consider a 4GB machine with effective_cache_size set to > 3GB. Then we'll assume that any relation less than 153MB table is > 100% cached, a 1 GB table is 15% cached, and a 3 GB table is 5% > cached. That doesn't seem quite right, though: the caching percentage > drops off very quickly after you exceed the threshold. > > *thinks* > > I wondering if we could do something with a formula like 3 * > amount_of_data_to_read / (3 * amount_of_data_to_read + > effective_cache_size) = percentage NOT cached. That is, if we're > reading an amount of data equal to effective_cache_size, we assume 25% > caching, and plot a smooth curve through that point. In the examples > above, we would assume that a 150MB read is 87% cached, a 1GB read is > 50% cached, and a 3GB read is 25% cached. But isn't it already the behavior of effective_cache_size usage ? See index_pages_fetched() in costsize.c > >> BTW, it seems that all these variants have an implicit assumption that >> if you're reading a small part of the table it's probably part of the >> working set; which is an assumption that could be 100% wrong. I don't >> see a way around it without trying to characterize the data access at >> an unworkably fine level, though. > > Me neither, but I think it will frequently be true, and I'm not sure > it will hurt very much when it isn't. I mean, if you execute the same > query repeatedly, that data will become hot soon enough. If you > execute a lot of different queries that each touch a small portion of > a big, cold table, we might underestimate the costs of the index > probes, but so what? There's probably no better strategy for > accessing that table anyway. Perhaps you can construct an example > where this underestimate affects the join order in an undesirable > fashion, but I'm having a hard time getting worked up about that as a > potential problem case. Our current system - where we essentially > assume that the caching percentage is uniform across the board - can > have the same problem in less artificial cases. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ ; PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance