Mark <mwchambers@xxxxxxxxx> writes: > I have a long running job which deletes all of the common_student table and > then repopulates it. It takes long time to load all the other data and > commit the transaction. I didn't think the delete inside the transaction > would have any effect until it is commited or rolled back. > I will have to rewrite the application so it updates the existing rows > rather than deleting all and then inserting. Hmm ... I'm not sure if that will actually make things better. The root of the issue is what analyze.c does with DELETE_IN_PROGRESS tuples: * We count delete-in-progress rows as still live, using * the same reasoning given above; but we don't bother to * include them in the sample. The "reasoning given above" is a reference to what happens for INSERT_IN_PROGRESS tuples: * Insert-in-progress rows are not counted. We assume * that when the inserting transaction commits or aborts, * it will send a stats message to increment the proper * count. This works right only if that transaction ends * after we finish analyzing the table; if things happen * in the other order, its stats update will be * overwritten by ours. However, the error will be large * only if the other transaction runs long enough to * insert many tuples, so assuming it will finish after us * is the safer option. Now the problem with this, from your perspective, is that *neither* INSERT_IN_PROGRESS nor DELETE_IN_PROGRESS tuples are included in ANALYZE's data sample. So a long-running update transaction will cause all the rows it changes to be excluded from the sample until commit. This seems like a bad thing, and it definitely means that adjusting your app as you're contemplating won't help. In order to handle the bulk-update scenario sanely, it seems like we ought to allow one of INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples to be included. And it looks like, for consistency with the row-counting logic, the one that's included needs to be DELETE_IN_PROGRESS. This is a slightly annoying conclusion, because it means we're accumulating stats that we know are likely to soon be obsolete, but I think sampling INSERT_IN_PROGRESS tuples instead would lead to very strange results in some cases. In short, I think we want to do this to the DELETE_IN_PROGRESS logic: if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple.t_data))) deadrows += 1; else + { + sample_it = true; liverows += 1; + } with suitable adjustment of the adjacent comment. Thoughts? regards, tom lane