On Fri, 2007-06-01 at 16:58 +0400, Teodor Sigaev wrote: > >> <2007-06-01 13:11:29.365 CEST:%> DEBUG: 00000: Ressource manager (13) > >> has partial state information > > To me, this points clearly to there being an improperly completed action > > in resource manager 13. (GIN) In summary, it appears that there may be > > an issue with the GIN code for WAL recovery and this is effecting the > > Warm Standby. > > Hmm. I found that gin_xlog_cleanup doesn't reset incomplete_splits list. Is it > possible reason of bug? Hi Teodor, Hmm, well, the list should be empty by that point anyway. That code is only executed at the end of xlog replay, not half-way through as we are seeing. There are two possibilities: 1. There are some incomplete splits, pointing to a likely bug in GIN 2. There are so many index splits that we aren't able to make a successful restartpoint using the current mechanism. Not a bug, but would be an issue with how restartpoints interact with GIN (possibly other index types also). When we wrote this I thought (2) would be a problem, but its not shown up to be so for btrees (yet, I guess). I have some ideas if its (2). The attached patch should show which of these it is. I'll dress it up a little better so we have a debug option on this. Please note I've not tested this patch myself, so Frank if you don't mind me splatting something at you we'll see what we see. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Index: src/backend/access/gin/ginxlog.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/access/gin/ginxlog.c,v retrieving revision 1.6 diff -c -r1.6 ginxlog.c *** src/backend/access/gin/ginxlog.c 5 Jan 2007 22:19:21 -0000 1.6 --- src/backend/access/gin/ginxlog.c 1 Jun 2007 13:35:05 -0000 *************** *** 26,37 **** BlockNumber leftBlkno; BlockNumber rightBlkno; BlockNumber rootBlkno; } ginIncompleteSplit; static List *incomplete_splits; static void ! pushIncompleteSplit(RelFileNode node, BlockNumber leftBlkno, BlockNumber rightBlkno, BlockNumber rootBlkno) { ginIncompleteSplit *split; --- 26,39 ---- BlockNumber leftBlkno; BlockNumber rightBlkno; BlockNumber rootBlkno; + XLogRecPtr lsn; } ginIncompleteSplit; static List *incomplete_splits; static void ! pushIncompleteSplit(RelFileNode node, BlockNumber leftBlkno, ! BlockNumber rightBlkno, BlockNumber rootBlkno, XLogRecPtr lsn) { ginIncompleteSplit *split; *************** *** 43,48 **** --- 45,51 ---- split->leftBlkno = leftBlkno; split->rightBlkno = rightBlkno; split->rootBlkno = rootBlkno; + split->lsn = lsn; incomplete_splits = lappend(incomplete_splits, split); *************** *** 324,330 **** UnlockReleaseBuffer(rootBuf); } else ! pushIncompleteSplit(data->node, data->lblkno, data->rblkno, data->rootBlkno); UnlockReleaseBuffer(rbuffer); UnlockReleaseBuffer(lbuffer); --- 327,333 ---- UnlockReleaseBuffer(rootBuf); } else ! pushIncompleteSplit(data->node, data->lblkno, data->rblkno, data->rootBlkno, lsn); UnlockReleaseBuffer(rbuffer); UnlockReleaseBuffer(lbuffer); *************** *** 600,605 **** --- 603,623 ---- gin_safe_restartpoint(void) { if (incomplete_splits) + { + ListCell *l; + int nsplits = list_length(incomplete_splits); + + elog(LOG,"GIN incomplete splits=%d", nsplits); + if (nsplits < 10) + { + foreach(l, incomplete_splits) + { + ginIncompleteSplit *split = (ginIncompleteSplit *) lfirst(l); + elog(LOG,"GIN incomplete split root:%u l:%u r:%u at redo %X/%X", + split->rootBlkno, split->leftBlkno, split->rightBlkno, split->lsn.xlogid, split->lsn.xrecoff); + } + } return false; + } return true; }