Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

Alvaro Herrera <alvherre@xxxxxxxxxxxxxxx> · Fri, 5 Jun 2015 15:53:31 -0300

Robert Haas wrote:
> On Fri, Jun 5, 2015 at 2:20 AM, Noah Misch <noah@xxxxxxxxxxxx> wrote:
> > On Thu, Jun 04, 2015 at 05:29:51PM -0400, Robert Haas wrote:
> >> Here's a new version with some more fixes and improvements:
> >
> > I read through this version and found nothing to change.  I encourage other
> > hackers to study the patch, though.  The surrounding code is challenging.
> 
> Andres tested this and discovered that my changes to
> find_multixact_start() were far more creative than intended.
> Committed and back-patched with a trivial fix for that stupidity and a
> novel-length explanation of the changes.

I think novel-length is fine.  The bug itself is pretty complicated, and
so is the solution.  Many thanks for working through this.

FWIW I tested with the (attached) reproducer script(*) for my customer's
problem, and it works fine now where it failed before.  One thing which
surprised me a bit, but in hindsight should have been pretty obvious, is
that the "multixact member protections are fully armed" message is only
printed once the standby gets out of recovery, instead of when it
reaches consistent state or some such earlier point.

(*) Actually the script cheats to get past an issue, which I couldn't
actually figure out, that a file can't be "seeked"; I just do a "touch"
to create an empty file there, which causes the same error situation as
on my customer's log.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment:
repro-chkpt-replay-failure.sh

Description: Bourne shell script
-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general