On 2015-05-30 00:52:37 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > > I considered for a second whether the solution for that could be to not > > truncate while inconsistent - but I think that doesn't solve anything as > > then we can end up with directories where every single offsets/member > > file exists. > > Hang on a minute. We don't need to scan any files to determine the > truncate point for offsets; we have the valid range for them in > pg_control, as nextMulti + oldestMulti. And using those end points, we > can look for the offsets corresponding to each, and determine the member > files corresponding to the whole set; it doesn't matter what other files > exist, we just remove them all. In other words, maybe we can get away > with considering truncation separately for offset and members on > recovery: do it like today for offsets (i.e. at each restartpoint), but > do it only in TrimMultiXact for members. Is oldestMulti, nextMulti - 1 really suitable for this? Are both actually guaranteed to exist in the offsets slru and be valid? Hm. I guess you intend to simply truncate everything else, but just in offsets? > One argument against this idea is that we may not want to keep a full > set of member files on standbys (due to disk space usage), but that's > what will happen unless we truncate during replay. I think that argument is pretty much the death-knell.= > > I think at least for 9.5+ we should a) invent proper truncation records > > for pg_multixact b) start storing oldestValidMultiOffset in pg_control. > > The current hack of scanning the directories to get knowledge we should > > have is a pretty bad hack, and we should not continue using it forever. > > I think we might end up needing to do a) even in the backbranches. > > Definitely agree with WAL-logging truncations; also +1 on backpatching > that to 9.3. We already have experience with adding extra WAL records > on minor releases, and it didn't seem to have bitten too hard. I'm inclined to agree. My only problem is that I'm not sure whether we can find a way of doing all this without adding a pg_control field. Let me try to sketch this out: 1) We continue determining the oldest SlruScanDirectory(SlruScanDirCbFindEarliest) on the master to find the oldest offsets segment to truncate. Alternatively, if we determine it to be safe, we could use oldestMulti to find that. 2) SlruScanDirCbRemoveMembers is changed to return the range of members to remove, instead of doing itself 3) We wal log [oldest offset segment guaranteed to not be alive, nextmulti) for offsets, and [oldest members segment guaranteed to not be alive, nextmultioff), and redo truncations for the entire range during recovery. I'm pretty tired right now, but this sounds doable. Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general