On Thu, Nov 11, 2010 at 10:00 AM, Kevin Grittner <Kevin.Grittner@xxxxxxxxxxxx> wrote: > Mladen Gogala <mladen.gogala@xxxxxxxxxxx> wrote: > >> create a definitive bias toward one type of the execution plan. > > We're talking about trying to support the exact opposite. This all > started because a database which was tuned for good response time > for relatively small queries against a "hot" portion of some tables > chose a bad plan for a weekend maintenance run against the full > tables. We're talking about the possibility of adapting the cost > factors based on table sizes as compared to available cache, to more > accurately model the impact of needing to do actual disk I/O for > such queries. > > This also is very different from trying to adapt queries to what > happens to be currently in cache. As already discussed on a recent > thread, the instability in plans and the failure to get to an > effective cache set make that a bad idea. The idea discussed here > would maintain a stable plan for a given query, it would just help > choose a good plan based on the likely level of caching. Let's back up a moment and talk about what the overall goal is, here. Ideally, we would like PostgreSQL to have excellent performance at all times, under all circumstances, with minimal tuning. Therefore, we do NOT want to add variables that will, by design, need constant manual adjustment. That is why I suggested that Tom's idea of an assume_cached GUC is probably not what we really want to do. On the other hand, what I understand Mladen to be suggesting is something completely different. He's basically saying that, of course, he wants it to work out of the box most of the time, but since there are guaranteed to be cases where it doesn't, how about providing some knobs that aren't intended to be routinely twaddled but which are available in case of emergency? Bravo, I say! Consider the case of whether a table is cached. Currently, we estimate that it isn't, and you can sort of twaddle that assumption globally by setting seq_page_cost and random_page_cost. In 9.0, you can twaddle it with a bit more granularity by adjusting seq_page_cost and random_page_cost on a per-tablespace basis. But that's really intended to handle the case where you have one tablespace on an SSD and another that isn't. It doesn't really model caching at all; we're just abusing it as if it does. If 90% of a table is cached, you can't simply multiply the cost of reading it by 0.1, because now many of those reads will be random I/O rather than sequential I/O. The right thing to do is to estimate what percentage of the table will be cached, then estimate how much random and sequential I/O it'll take to get the rest, and then compute the cost. To do that, we can adopt the approach proposed upthread of comparing the size of the table to effective_cache_size. We come up with some function f, such that f(effective_cache_size, table_size) = assumed_caching_percentage, and then from there we estimate random I/Os and sequential I/Os, and from there we estimate costs. This is a good system, almost certainly better than what we have now. However, it's also guaranteed to not always work. The DBA may know, for example, that one particular table that is quite large is always fully cached because it is very heavily access. So why not let them pin the assumed_caching_percentage for that table to 100%? I don't see any reason at all. Most people will never need to touch that setting, but it's there in case you really, really need it. We've traditionally been reluctant to do this sort of thing (as the email Tom just sent reflects) but I think we should soften up a bit. A product gets hard to use when it has knobs that MUST be tuned to make it work at all, and certainly AFAICT Oracle falls into that category. My rollback segment is full? My table is out of extents? Well allocate some more space then; I certainly wouldn't have imposed an arbitrary cap on the table size if I'd known I was doing it. However, that's not the same thing as having knobs that are *available* when the shit really hits the fan. By failing to provide that type of knob, we're not facilitating ease of use; we're just making it difficult for the small percentage of people who have problems to fix them, which is another kind of non-ease-of-use. In fact, we already have a few knobs of this type. We have a statistics target which can be overriden on a per-column basis, and beginning in 9.0, you can override the planner's n_distinct estimates in the same way. Why? Because we know that it's not achievable to estimate n_distinct accurately in all cases without making ANALYZE unreasonably slow. I bet that very, VERY few people will ever use that feature, so it costs nothing in terms of "oh, another setting I have to configure". But for people who are getting bitten by inaccurate n_distinct estimates, it will be very nice to have that as an escape hatch. I see no harm, and much value, in providing similar escape hatches elsewhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance