Re: Streaming replication slave crash

Quentin Hartman <qhartman@xxxxxxxxxxxxxxxxxxx> · Fri, 29 Mar 2013 11:10:41 -0600

On Fri, Mar 29, 2013 at 10:50 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

Quentin Hartman <qhartman@xxxxxxxxxxxxxxxxxxx> writes:

> On Fri, Mar 29, 2013 at 10:37 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

>> What process did you use for setting up the slave?

> I used an rsync from the master while both were stopped.

If the master was shut down cleanly (not -m immediate) then the bug fix

I was thinking about wouldn't explain this.  The fact that the panic

didn't recur after restarting seems to void that theory as well.  I'm

not sure what to make of that angle.

Yes, it was shut down cleanly. A good thought, but I don't think it's relevant in this case.

Can you determine which table is being complained of in the failure

message, ie, what has relfilenode 63370 in database 63229?  If so it

would be interesting to know what was being done to that table on the

master.

Good point! Looking deeper into that, it's actually one of our smaller tables, and it doesn't seem to have any corruption, on either server. I was able to select all the records from it and the content seems sane. The only thing that would have been happening on that table is an INSERT or UPDATE.

I think I'm going to run with the spurious EC2 hiccup explanation. I'm comfortable with that given the extra due diligence I've done with your (and Lonni's) guidance.

Thanks!

QH