On Fri, Dec 11, 2020 at 08:39:01AM -0500, Brian Foster wrote: > On Fri, Dec 11, 2020 at 08:50:04AM +1100, Dave Chinner wrote: > > On Thu, Dec 10, 2020 at 09:23:58AM -0500, Brian Foster wrote: > > > On Thu, Dec 10, 2020 at 07:51:32AM +1100, Dave Chinner wrote: > > > > For changing incompat log flags, covering also provides exactly what > > > > we need - an empty log with only a dirty superblock in the journal > > > > to recover. All that we need then is to recheck feature flags after > > > > recovery (not just clear log incompat flags) because we might need > > > > to use this to also set incompat feature flags dynamically. > > > > > > > > > > I'd love it if we use a better term for a log isolated to the superblock > > > buffer. So far we've discussed an empty log with a dummy superblock, an > > > empty log with a dirty superblock, and I suppose we could also have an > > > actual empty log. :P "Covered log," perhaps? > > > > On terminology: "covered log" has been around for 25 years > > in XFS, and it's very definition is "an empty, up-to-date log except > > for a dummy object we logged without any changes to update the log > > tail". And, by definition, that dummy object is dirty in the log Hmm, can we capture in a comment in the logging code the precise meaning of "covered log" in whatever patch we end up generating? I've long suspected that's more or less what a covered log meant (the log has one item just to prove that head==tail and there's nothing else to see here) but it sure would've been nice to confirm that. :) > > even if it contains no actual modifications when it was logged. > > So in this case, "dummy" is directly interchangable with "dirty" > > when looking at the log contents w.r.t. recovery behaviour. > > > > It's worth looking at a historical change, too. Log covering > > originally used the root inode and not the superblock. That caused > > problems on linux dirtying VFS inode state, so we changed it to the > > superblock in commit 1a387d3be2b3 ("xfs: dummy transactions should > > not dirty VFS state") about a decade ago. Note the name of the > > function at the time (xfs_fs_log_dummy()) and that it's description > > explicitly mentions that a dummy transaction used for updating the > > log tail when covering it. > > > > The only difference in this discussion is fact that we may be > > replacing the "dummy" with a modified superblock. Both are still > > dirty superblock objects in the log, and serve the same purpose for > > covering the log, but instead of using a dummy object we update the > > log incompat flags so taht the only thing that gets replayed if we > > crash is the modification to the superblock flags... > > > > Makes sense, thanks. > > > Finally, no, we can't have a truly empty log while the filesystem is > > mounted because log transaction records must not be empty. Further, > > we can only bring the log down to an "empty but not quite clean" > > state while mounted, because if the log is actually marked clean and > > we crash then log recovery will not run and so we will not clean up > > files that were open-but-unlinked when the system crashed. Whatever happened to Eric's patchset to make us process unlinked inodes at mount time so that freeze images wouldn't require recovery? > Yeah, I wasn't suggesting to do that, just surmising about terminology. > It sounds like "covered log" refers to the current state logging an > unmodified superblock object. > > > > > > > > > but didn't necessarily clean the log. I wonder if we can fit that state > > > > > > into the existing log covering mechanism by logging the update earlier > > > > > > (or maybe via an intermediate state..)? I'll need to go stare at that > > > > > > code a bit.. > > > > > > > > That's kinda what I'd like to see done - it gives us a generic way > > > > of altering superblock incompat feature fields and filesystem > > > > structures dynamically with no risk of log recovery needing to parse > > > > structures it may not know about. > > > > > > > > > > We might have to think about if we want to clear such feature bits in > > > all log covering scenarios (idle, freeze) vs. just unmount. > > > > I think clearing bits can probably be lazy. Set a state flag to tell > > log covering that it needs to do an update, and so when the covering > > code finally runs the feature bit modification just happens. > > > > That's reasonable on its own, it just means we have to support dynamic > setting of the associated bit(s) at runtime... The one downside of clearing the log incompat flags when the log goes idle is that right now I print a big EXPERIMENTAL warning whenever we turn on the atomic swapext feature, and doing this will cause that message to cycle over and over... That's kind of a minor point though. Either we add a superblock flag to remember that we've already warned about the experimental feature, or we just don't bother at all. Next time I have spare cycles I'll look into adapting this patch to make it clear log incompat flags at unmount and/or covering time, assuming nobody beats me to it. > > > My previous suggestion in the other sub-thread was to set bits on > > > mount and clear on unmount because that (hopefully) simplifies the > > > broader implementation. Otherwise we need to manage dynamic > > > setting of bits when the associated log items are active, and that > > > still isn't truly representative because the bits would remain set > > > long after active items fall out of the log, until the log is > > > covered. Either way is possible, but I'm curious what others > > > think. > > > > I think that unless we are setting a log incompat flag, log covering > > should unconditionally clear the log incompat flags. Because the log > > is empty, and recovery of a superblock buffer should -always- be > > possible, then by definition we have no log incompat state > > present.... > > > > It should, but in practice will depend on whether the superblock was > written back or not before a crash while in the covered state. So > recovery may or may not actually work on an older kernel after the log > is covered. I think that is fine from a correctness standpoint because > the important part is to make sure an older kernel will not attempt to > recover incompatible items, and we make that guarantee by covering the > log before clearing the bit. I'm merely pointing this out because I > still think it's more straightforward to just enforce that a kernel with > the associated feature bit be required to recover a dirty log. <nod> > > As for a mechanism for dynamically adding log incompat flags? > > Perhaps we just do that in xfs_trans_alloc() - add an log incompat > > flags field into the transaction reservation structure, and if > > xfs_trans_alloc() sees an incompat field set and the superblock > > doesn't have it set, the first thing it does is run a "set log > > incompat flag" transaction before then doing it's normal work... > > > > This should be rare enough it doesn't have any measurable > > performance overhead, and it's flexible enough to support any log > > incompat feature we might need to implement... > > > > But I don't think that is sufficient. As Darrick pointed out up-thread, > the updated superblock has to be written back before we're allowed to > commit transactions with incompatible items. Otherwise, an older kernel > can attempt log recovery with incompatible items present if the > filesystem crashes before the superblock is written back. > > We could do some sync transaction and/or sync write dance at runtime, > but I think the performance/overhead aspect becomes slightly less > deterministic. It's not clear to me how many bits we'd support over > time, and whether users would notice hiccups when running some sustained > workload and happen to trigger sync transaction/write or AIL push > sequences to set internal bits. Probably few, since I imagined that most new log items are going to get built on behalf of new disk format features. OTOH Allison and I are both building features that don't change the disk format... > My question is how flexible do we really need to make incompatible log > recovery support? Why not just commit the superblock once at mount time > with however many bits the current kernel supports and clear them on > unmount? (Or perhaps consider a lazy setting variant where we set all > supported bits on the first modification..?) I don't think setting it at mount (or first mod) time is a good idea either, since that means that the fs cannot be recovered on an old kernel even if we never actually used the new log items. --D > > Brian > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@xxxxxxxxxxxxx > > >