On Fri, May 29, 2015 at 10:37:57AM +1200, Thomas Munro wrote: > On Fri, May 29, 2015 at 7:56 AM, Robert Haas <robertmhaas@xxxxxxxxx> wrote: > > - There's a third possible problem related to boundary cases in > > SlruScanDirCbRemoveMembers, but I don't understand that one well > > enough to explain it. Maybe Thomas can jump in here and explain the > > concern. > > I noticed something in passing which is probably not harmful, and not > relevant to this bug report, it was just a bit confusing while > testing: SlruScanDirCbRemoveMembers never deletes any files if > rangeStart == rangeEnd. In practice, if you have an idle cluster with > a lot of multixact data and you VACUUM FREEZE all databases and then > CHECKPOINT, you might be surprised to see no member files going away > quite yet, but they'll eventually be truncated by a future checkpoint, > once rangeEnd has had a chance to advance to the next page due to more > multixacts being created. > > If we want to fix this one day, maybe the right thing to do is to > treat the rangeStart == rangeEnd case the same way we treat rangeStart > < rangeEnd, that is, to assume that the range of pages isn't > wrapped/inverted in this case. I agree. Because we round rangeStart down to a segment boundary, oldest and next member offsets falling on the same page typically implies rangeStart<rangeEnd. Only when the page they share happens to be the first page of a segment does one observe rangeStart==RangeEnd. While testing this (with inconsistent-multixact-fix-master.patch applied, FWIW), I noticed a nearby bug with a similar symptom. TruncateMultiXact() omits the nextMXact==oldestMXact special case found in each other find_multixact_start() caller, so it reads the offset of a not-yet-created MultiXactId. The usual outcome is to get rangeStart==0, so we truncate less than we could. This can't make us truncate excessively, because nextMXact==oldestMXact implies no table contains any mxid. If nextMXact happens to be the first of a segment, an error is possible. Procedure: 1. Make a fresh cluster. 2. UPDATE pg_database SET datallowconn = true 3. Consume precisely 131071 mxids. Number of offsets per mxid is unimportant. 4. vacuumdb --freeze --all Expected state after those steps: $ pg_controldata | grep NextMultiXactId Latest checkpoint's NextMultiXactId: 131072 Checkpoint will fail like this: 26699 2015-05-31 17:22:33.134 GMT LOG: statement: checkpoint 26661 2015-05-31 17:22:33.134 GMT DEBUG: performing replication slot checkpoint 26661 2015-05-31 17:22:33.136 GMT ERROR: could not access status of transaction 131072 26661 2015-05-31 17:22:33.136 GMT DETAIL: Could not open file "pg_multixact/offsets/0002": No such file or directory. 26699 2015-05-31 17:22:33.234 GMT ERROR: checkpoint request failed 26699 2015-05-31 17:22:33.234 GMT HINT: Consult recent messages in the server log for details. 26699 2015-05-31 17:22:33.234 GMT STATEMENT: checkpoint This does not block startup, and creating one mxid hides the problem again. Thus, it is not a top-priority bug like some other parts of this thread. I mention it today mostly so it doesn't surprise hackers testing other fixes. Thanks, nm -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general