On 2015-05-29 15:08:11 -0400, Robert Haas wrote: > It seems pretty clear that we can't effectively determine anything > about member wraparound until the cluster is consistent. I wonder if this doesn't actually hints at a bigger problem. Currently, to determine where we need to truncate SlruScanDirectory() is used. That, afaics, could actually be a problem during recovery when we're not consistent. Consider the scenario where a very large database is copied while running. At the start of the backup we'll determine at which checkpoint recovery will start and store it in the label. After that the copy will start, copying everything slowly. That works because we expect recovery to fix things up. The problem I see WRT multixacts is that the copied state of pg_multixact could be wildly different from the one at the label's checkpoint. During recovery, before reaching the first checkpoint, we'll create multixact files as used at the time of the checkpoint. But the rest of pg_multixact may be ahead 2**31 xacts. For clog and such that's not a problem because the truncation etc. points are all stored in WAL - during recovery we just replay the truncations that happened on the master, there's no need to look at the data directory. And we won't access the clog before being consistent. But for multixacts is different. To avoid ending up with pg_multixact/*/* directories we have to do truncations during recovery. As there's currently no truncation records we have to do that scanning the data directory. But that state could be "from the future". I considered for a second whether the solution for that could be to not truncate while inconsistent - but I think that doesn't solve anything as then we can end up with directories where every single offsets/member file exists. We could possibly try to fix that by always truncating away slru segments in offsets that we know to be too old to exist in a valid database. But achieving the same for members fries my brain. It also seems awfully risky. I think at least for 9.5+ we should a) invent proper truncation records for pg_multixact b) start storing oldestValidMultiOffset in pg_control. The current hack of scanning the directories to get knowledge we should have is a pretty bad hack, and we should not continue using it forever. I think we might end up needing to do a) even in the backbranches. Am I missing something? This problem isn't conflicting with most of the fixes you describe, so I'll continue with reviewing those. Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general