On Mon, Jun 1, 2015 at 4:55 PM, Noah Misch <noah@xxxxxxxxxxxx> wrote: > While testing this (with inconsistent-multixact-fix-master.patch applied, > FWIW), I noticed a nearby bug with a similar symptom. TruncateMultiXact() > omits the nextMXact==oldestMXact special case found in each other > find_multixact_start() caller, so it reads the offset of a not-yet-created > MultiXactId. The usual outcome is to get rangeStart==0, so we truncate less > than we could. This can't make us truncate excessively, because > nextMXact==oldestMXact implies no table contains any mxid. If nextMXact > happens to be the first of a segment, an error is possible. Procedure: > > 1. Make a fresh cluster. > 2. UPDATE pg_database SET datallowconn = true > 3. Consume precisely 131071 mxids. Number of offsets per mxid is unimportant. > 4. vacuumdb --freeze --all > > Expected state after those steps: > $ pg_controldata | grep NextMultiXactId > Latest checkpoint's NextMultiXactId: 131072 > > Checkpoint will fail like this: > 26699 2015-05-31 17:22:33.134 GMT LOG: statement: checkpoint > 26661 2015-05-31 17:22:33.134 GMT DEBUG: performing replication slot checkpoint > 26661 2015-05-31 17:22:33.136 GMT ERROR: could not access status of transaction 131072 > 26661 2015-05-31 17:22:33.136 GMT DETAIL: Could not open file "pg_multixact/offsets/0002": No such file or directory. > 26699 2015-05-31 17:22:33.234 GMT ERROR: checkpoint request failed > 26699 2015-05-31 17:22:33.234 GMT HINT: Consult recent messages in the server log for details. > 26699 2015-05-31 17:22:33.234 GMT STATEMENT: checkpoint > > This does not block startup, and creating one mxid hides the problem again. > Thus, it is not a top-priority bug like some other parts of this thread. I > mention it today mostly so it doesn't surprise hackers testing other fixes. Thanks. As mentioned elsewhere in the thread, I discovered that the same problem exists for page boundaries, with a different error message. I've tried the attached repro scripts on 9.3.0, 9.3.5, 9.4.1 and master with the same results: FATAL: could not access status of transaction 2048 DETAIL: Could not read from file "pg_multixact/offsets/0000" at offset 8192: Undefined error: 0. FATAL: could not access status of transaction 131072 DETAIL: Could not open file "pg_multixact/offsets/0002": No such file or directory. But, yeah, this isn't the bug we're looking for. -- Thomas Munro http://www.enterprisedb.com
Attachment:
checkpoint-page-boundary.sh
Description: Bourne shell script
Attachment:
checkpoint-segment-boundary.sh
Description: Bourne shell script
-- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general