---- Original message ---- >Date: Thu, 11 Nov 2010 15:29:40 -0500 >From: pgsql-performance-owner@xxxxxxxxxxxxxx (on behalf of Robert Haas <robertmhaas@xxxxxxxxx>) >Subject: Re: anti-join chosen even when slower than old plan >To: Tom Lane <tgl@xxxxxxxxxxxxx> >Cc: Kevin Grittner <Kevin.Grittner@xxxxxxxxxxxx>,Mladen Gogala <mladen.gogala@xxxxxxxxxxx>,"pgsql-performance@xxxxxxxxxxxxxx" <pgsql-performance@xxxxxxxxxxxxxx> > >On Thu, Nov 11, 2010 at 2:35 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >> Robert Haas <robertmhaas@xxxxxxxxx> writes: >>> Yeah. For Kevin's case, it seems like we want the caching percentage >>> to vary not so much based on which table we're hitting at the moment >>> but on how much of it we're actually reading. >> >> Well, we could certainly take the expected number of pages to read and >> compare that to effective_cache_size. The thing that's missing in that >> equation is how much other stuff is competing for cache space. I've >> tried to avoid having the planner need to know the total size of the >> database cluster, but it's kind of hard to avoid that if you want to >> model this honestly. > >I'm not sure I agree with that. I mean, you could easily have a >database that is much larger than effective_cache_size, but only that >much of it is hot. Or, the hot portion could move around over time. >And for reasons of both technical complexity and plan stability, I >don't think we want to try to model that. It seems perfectly >reasonable to say that reading 25% of effective_cache_size will be >more expensive *per-page* than reading 5% of effective_cache_size, >independently of what the total cluster size is. > >> Would it be at all workable to have an estimate that so many megs of a >> table are in cache (independently of any other table), and then we could >> scale the cost based on the expected number of pages to read versus that >> number? The trick here is that DBAs really aren't going to want to set >> such a per-table number (at least, most of the time) so we need a >> formula to get to a default estimate for that number based on some simple >> system-wide parameters. I'm not sure if that's any easier. > >That's an interesting idea. For the sake of argument, suppose we >assume that a relation which is less than 5% of effective_cache_size >will be fully cached; and anything larger we'll assume that much of it >is cached. Consider a 4GB machine with effective_cache_size set to >3GB. Then we'll assume that any relation less than 153MB table is >100% cached, a 1 GB table is 15% cached, and a 3 GB table is 5% >cached. That doesn't seem quite right, though: the caching percentage >drops off very quickly after you exceed the threshold. > >*thinks* > >I wondering if we could do something with a formula like 3 * >amount_of_data_to_read / (3 * amount_of_data_to_read + >effective_cache_size) = percentage NOT cached. That is, if we're >reading an amount of data equal to effective_cache_size, we assume 25% >caching, and plot a smooth curve through that point. In the examples >above, we would assume that a 150MB read is 87% cached, a 1GB read is >50% cached, and a 3GB read is 25% cached. > >> BTW, it seems that all these variants have an implicit assumption that >> if you're reading a small part of the table it's probably part of the >> working set; which is an assumption that could be 100% wrong. I don't >> see a way around it without trying to characterize the data access at >> an unworkably fine level, though. > >Me neither, but I think it will frequently be true, and I'm not sure >it will hurt very much when it isn't. I mean, if you execute the same >query repeatedly, that data will become hot soon enough. If you >execute a lot of different queries that each touch a small portion of >a big, cold table, we might underestimate the costs of the index >probes, but so what? There's probably no better strategy for >accessing that table anyway. Perhaps you can construct an example >where this underestimate affects the join order in an undesirable >fashion, but I'm having a hard time getting worked up about that as a >potential problem case. Our current system - where we essentially >assume that the caching percentage is uniform across the board - can >have the same problem in less artificial cases. > >-- >Robert Haas >EnterpriseDB: http://www.enterprisedb.com >The Enterprise PostgreSQL Company > >-- >Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-performance On a thread some time ago, on a similar subject, I opined that I missed the ability to assign tables to tablespaces and buffers to tablespaces, thus having the ability to isolate needed tables (perhaps a One True Lookup Table, for example; or a Customer table) to memory without fear of eviction. I was sounding beaten about the face and breast. It really is an "Enterprise" way of handling the situation. regards, Robert -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance