On Mon, Aug 17, 2009 at 7:59 PM, Tom Lane<tgl@xxxxxxxxxxxxx> wrote: > Greg Stark <gsstark@xxxxxxx> writes: >> Excluding the cases where our own xid is in the tuple I think the >> relevant cases are either > >> xmin aborted or in progress (or in future) >> MOVED_OFF and xvac committed >> MOVED_IN and xvac aborted or is in progress (or in future) > > Ah. I hadn't bothered to check the code in detail before asking about > the current XID. Given subsequent data, it seems that current XID must > have moved past xvac while we were wondering about it. This could mean > either corrupted xvac values, or that the crash caused current XID to go > backwards (suggesting loss of both the current pg_control and a big > chunk of WAL). Since multiple tuples on different pages were involved, > I'm inclined to believe the second theory. I would think xmin would be the fewest entities possibility but we'll never know. For what it's worth at EDB I dealt with another case like this and I imagine others have too. I think it's too easy to do things in the wrong order or miss a step and end up with these kinds of problems. I would really like to know what happened here which caused the problem. Do you have records of how you created the slave? When you took the initial image, did you use a cold backup or a hot backup? Did you use pg_start_backup()/pg_stop_backup()? When you failed over was there anything special happening? Was it because of a failure on the master? Was a vacuum full running? When the slave came up do you have the log messages saying it was starting recovery and when it was finishing recovery and starting normal operations? -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general