Kevin Grittner <kgrittn@xxxxxxxxx> writes: > Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >> I wonder if we could ameliorate this problem by making >> get_actual_variable_range() use SnapshotDirty >> In that thread I claimed that a current MVCC snapshot was the >> most appropriate thing, which it probably is; > If it reads from the end of the index, won't it actually be reading > starting with the value we will obtain using SnapshotDirty, and > since the transaction is not yet committed, won't we be making the > trip to the heap for each index tuple we read from that point? Why > doesn't that make SnapshotDirty more appropriate for purposes of > cost estimation? This code isn't used (directly) for cost estimation. It's used for selectivity estimation, that is counting how many valid rows the query will fetch when it executes. We can't know that with certainty, of course, so the argument boils down to how good an approximation we're getting and how much it costs to get it. The original coding here was SnapshotNow, which could be defended on the grounds that it was the newest available data and thus the best possible approximation if the query gets executed with a snapshot taken later. That's still what's being used in the field, and of course it's got the problem we're on about that uncommitted rows cause us to move on to the next index entry (and thrash the ProcArrayLock for each uncommitted row we hit :-(). In HEAD we replaced that with GetActiveSnapshot, which is a better approximation if you assume the query will be executed with this same snapshot, and otherwise slightly worse (because older). But it's just the same as far as the cost imposed by uncommitted rows goes. My proposal to use SnapshotDirty instead is based on the thought that it will give the same answers as SnapshotNow for definitely-live or definitely-dead rows, while not imposing such large costs for in-doubt rows: it will accept the first in-doubt row, and thus prevent us from moving on to the next index entry. The reported problems seem to involve hundreds or thousands of in-doubt rows at the extreme of the index, so the cost will go down proportionally to whatever that count is. As far as the accuracy of the approximation is concerned, it's a win if you imagine that pending inserts will have committed by the time the query's execution snapshot is taken. Otherwise not so much. The other thing we might consider doing is using SnapshotAny, which would amount to just taking the extremal index entry at face value. This would be cheap all right (in fact, we might later be able to optimize it to not even visit the heap). However, I don't like the implications for the accuracy of the approximation. It seems quite likely that an erroneous transaction could insert a wildly out-of-range index entry and then roll back --- or even if it committed, somebody might soon come along and clean up the bogus entry in a separate transaction. If we use SnapshotAny then we'd believe that bogus entry until it got vacuumed, not just till it got deleted. This is a fairly scary proposition, because the wackier that extremal value is, the more impact it will have on the selectivity estimates. If it's demonstrated that SnapshotDirty doesn't reduce the estimation costs enough to solve the performance problem the complainants are facing, I'd be willing to consider using SnapshotAny here. But I don't want to take that step until it's proven that the more conservative approach doesn't solve the performance problem. There's an abbreviated version of this argument in the comments in my proposed patch at http://www.postgresql.org/message-id/11927.1384199294@xxxxxxxxxxxxx What I'm hoping will happen next is that the complainants will hot-patch that and see if it fixes their problems. We can't really determine what to do without that information. > It doesn't look like Robert did either, if you read the whole > message. In fact, he also questioned why index tuples which would > need to be read if we process from that end of the index don't > matter for purposes of cost estimation. The key point here is that we are not looking to estimate the cost of getting the extremal value. This code gets called if we are interested in a probe value falling within the endmost histogram bin; that's not necessarily, or even probably, the extremal value. We're just trying to find out if the bin endpoint has moved a lot since the last ANALYZE. Another way of explaining this is that what Robert was really suggesting is that we ought to account for the cost of fetching rows that are then determined not to be visible to the query. That's true, but it would be a conceptual error to factor that in by supposing that they're visible. The planner doesn't, at present, account for this cost explicitly, but I believe it's at least somewhat handled by the fact that we charge for page accesses on the basis of selectivity fraction times physical table size. A table containing a lot of uncommitted/dead tuples will get charged more I/O than a less bloated table. Of course, this approach can't account for special cases like "there's a whole lot of uncommitted tuples right at the end of the index range, and not so many elsewhere". That's below the granularity of what we can sensibly model at the moment, I'm afraid. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance