Maxim Boguk <maxim.boguk@xxxxxxxxx> writes: > On Tue, Feb 21, 2012 at 5:32 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >> If the index key values are not private information, could we see that >> with "pg_filedump -i -f" not just bare? > There is it: > http://maximboguk.com/static/etc/agency_statistics_pkey.pg_full_filedump.gz Thanks. The index definitely contains about a hundred apparently-live index entries, which shouldn't be there considering the table is empty, and it doesn't seem to be corrupted in any obvious way. However, I noticed that all the pages have btpo_cycleid zero, which seems a bit improbable given that there are some with BTP_SPLIT_END set, and the only way for that to become set is if there are adjacent index pages with different btpo_cycleid values. This led me to look for bugs that might reset btpo_cycleid incorrectly, and I believe I see one. _bt_delitems_delete has this code that was copied-and-pasted from its previous incarnation as _bt_delitems: /* * We can clear the vacuum cycle ID since this page has certainly been * processed by the current vacuum scan. */ opaque = (BTPageOpaque) PageGetSpecialPointer(page); opaque->btpo_cycleid = 0; Now that comment is a lie; this is called from _bt_findinsertloc not from vacuum. So the series of events I'm imagining is 1. Autovacuum starts. 2. A page gets split; the page in question is beyond where autovac has reached in its index scan, but a page from before that gets recycled to become the new right half. Both these pages get marked with the vacuum's cycleid. 3. Whatever's doing the insertions continues to insert into the same range of the index. (I notice that your INSERT query has a GROUP BY that uses the high columns of your pkey, which means it seems fairly likely that insertions occur in something close to index order. So this is not unlikely.) It hits a couple of duplicate index entries, finds they're dead, and hence marks them dead and sets the BTP_HAS_GARBAGE field on the new right-half page (we need to assume this happens because the split would've cleared that flag). 4. It keeps on inserting and eventually fills the the new right-half page. At this point, since BTP_HAS_GARBAGE is set, it runs _bt_delitems_delete and incorrectly clears btpo_cycleid on that page. 5. Finally, the concurrent vacuum reaches the originally split page. It notes the btpo_cycleid matching its cycleid and correctly realizes it has to go back and process the right-half page. However, when it reaches that page, it finds btpo_cycleid already zero and concludes it doesn't need to scan that page. Result: any entries that should have been removed, but were not already marked DEAD at the time _bt_delitems_delete ran, are not removed from the index. Now obviously I can't prove that this is exactly what happened in your index. It would have to have happened several times during a single vacuum run, I think, because the remaining index entries are spread across half a dozen index pages. However, all those pages are relatively early in the index --- the first such page is block 2, and the last one block 49, out of a 605-page index. And we know that your usage pattern is such that a whole lot of inserts would be going on at the same time that vacuum had been removing a whole lot of old entries (and hence causing low-numbered pages to get freed up and returned to the free space map). So it seems to fit. Anyway, clearing btpo_cycleid there is unquestionably wrong. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general