On Mon, May 6, 2013 at 1:14 AM, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote: > On 6 May 2013 02:51, Mark Kirkwood <mark.kirkwood@xxxxxxxxxxxxxxx> wrote: >> On 05/05/13 00:49, Simon Riggs wrote: >>> >>> On 3 May 2013 13:41, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote: >>> >>>> (3) to make the check on TransactionIdIsInProgress() into a heuristic, >>>> since we don't *need* to check that, so if we keep checking the same >>>> xid repeatedly we can reduce the number of checks or avoid xids that >>>> seem to be long running. That's slightly more coding than my quick >>>> hack here but seems worth it. >>>> >>>> I think we need both (1) and (3) but the attached patch does just (1). >>>> >>>> This is a similar optimisation to the one I introduced for >>>> TransactionIdIsKnownCompleted(), except this applies to repeated >>>> checking of as yet-incomplete xids, and to bulk concurrent >>>> transactions. >>> >>> >>> ISTM we can improve performance of TransactionIdIsInProgress() by >>> caching the procno of our last xid. >>> >>> Mark, could you retest with both these patches? Thanks. >>> >> >> Thanks Simon, will do and report back. > > OK, here's a easily reproducible test... > > Prep: > DROP TABLE IF EXISTS plan; > CREATE TABLE plan > ( > id INTEGER NOT NULL, > typ INTEGER NOT NULL, > dat TIMESTAMP, > val TEXT NOT NULL > ); > insert into plan select generate_series(1,100000), 0, > current_timestamp, 'some texts'; > CREATE UNIQUE INDEX plan_id ON plan(id); > CREATE INDEX plan_dat ON plan(dat); > > testcase.pgb > select count(*) from plan where dat is null and typ = 3; > > Session 1: > pgbench -n -f testcase.pgb -t 100 > > Session 2: > BEGIN; insert into plan select 1000000 + generate_series(1, 100000), > 3, NULL, 'b'; > > Transaction rate in Session 1: (in tps) > (a) before we run Session 2: > Current: 5600tps > Patched: 5600tps > > (b) after Session 2 has run, yet before transaction end > Current: 56tps > Patched: 65tps When I run this test case in single-client mode, I don't see nearly that much speedup, it just goes from 38.99 TPS to 40.12 TPS. But still, it is a speedup, and very reproducible (t-test p-val < 1e-40, n of 21 for both) But I also tried it with 4 pgbench clients, and ran into a collapse of the performance, TPS dropping down to ~8 TPS. It is too variable to figure out how reliable the speed-up with this patch is, so far. Apparently they are all fighting over the spinlock on the ProcArrayLock. This is a single quad core, "Intel(R) Xeon(R) CPU X3210 @ 2.13GHz" So I agree with (3) above, about not checking TransactionIdIsInProgress repeatedly. Or could we change the order of operations so that TransactionIdIsInProgress is checked only after XidInMVCCSnapshot? Or perhaps the comment "XXX Can we test without the lock first?" could be implemented and save the day here? Looking at the code, there is something that bothers me about this part: pxid = cxid = InvalidTransactionId; return false; If it is safe to return false at this point (as determined by the stale values of pxid and cxid) then why would we clear out the stale values so they can't be used in the future to also short circuit things? On the other hand, if the stale values need to be cleared so they are not misleading to future invocations, why is it safe for this invocation to have made a decision based on them? Maybe with more thought I will see why this is so. .... > > Which brings me back to Mark's original point, which is that we are > x100 times slower in this case and it *is* because the choice of > IndexScan is a bad one for this situation. > > After some thought on this, I do think we need to do something about > it directly, rather than by tuning infrastructire (as I just > attempted). The root cause here is that IndexScan plans are sensitive > to mistakes in data distribution, much more so than other plan types. > > The two options, broadly, are to either > > 1. avoid IndexScans in the planner unless they have a *significantly* > better cost. At the moment we use IndexScans if cost is lowest, even > if that is only by a whisker. This wouldn't work well in Mark's specific case, because the problem is that it is using the wrong index, not that it is using an index at all. There are two candidate indexes, and one looks slightly better but then gets ruined by the in-progress insert, while the other looks slightly worse but would be resistant to the in-progress insert. Switching from the bad index to the seq scan is not going to fix things. I don't think there is any heuristic solution here other than to keep track of the invisible data distribution as well as the visible data distribution. The more I've thought about it, the more I see the charm of Mark's original proposal. Why not build the statistics assuming that the in-progress insert will commit? It is not a complete solution, because autoanalyze will not get triggered until the transaction completes. But why not let the manual ANALYZE get the benefit of seeing them? The statistics serve two masters. One is to estimate how many rows will actually be returned. The other is to estimate how much work it will take to return them (including the work of groveling through a list of in-process tuples). Right now those are implicitly considered the same thing--we could change that and keep separate sets of statistics, but I think we could improve things some without doing that. For the first case of estimating actual rows returned, I think counting in-progress rows is a wash. It seems just about as likely that the bulk transaction which was in progress at the time of the last ANALYZE has already committed at the time of the planning (but not yet completed the autoanalyze) as it is that the bulk transaction is still in progress at the time of the planning; which means counting the rows as if they committed is sometimes right and sometimes wrong but in about equal measure. But for the second master, counting the in progress rows seems like a win all the time. Either we actually see them and do the work, or we refuse to see them but still have to do the work to figure that out. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance