Re: [QUESTION] Upgrade xfs filesystem to reflink support?

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 11 May 2022 08:05:23 +1000

On Tue, May 10, 2022 at 12:02:12PM -0700, Darrick J. Wong wrote:
> On Tue, May 10, 2022 at 09:21:03AM +0300, Amir Goldstein wrote:
> > On Mon, May 9, 2022 at 9:20 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> > > I think the upcoming nrext64 xfsprogs patches took in the first patch in
> > > that series.
> > >
> > > Question: Now that mkfs has a min logsize of 64MB, should we refuse
> > > upgrades for any filesystem with logsize < 64MB?
> > 
> > I think that would make a lot of sense. We do need to reduce the upgrade
> > test matrix as much as we can, at least as a starting point.
> > Our customers would have started with at least 1TB fs, so should not
> > have a problem with minimum logsize on upgrade.
> > 
> > BTW, in LSFMM, Ted had a session about "Resize patterns" regarding the
> > practice of users to start with a small fs and grow it, which is encouraged by
> > Cloud providers pricing model.
> > 
> > I had asked Ted about the option to resize the ext4 journal and he replied
> > that in theory it could be done, because the ext4 journal does not need to be
> > contiguous. He thought that it was not the case for XFS though.
> 
> It's theoretically possible, but I'd bet that making it work reliably
> will be difficult for an infrequent operation.  The old log would probably
> have to clean itself, and then write a single transaction containing
> both the bnobt update to allocate the new log as well as an EFI to erase
> it.  Then you write to the new log a single transaction containing the
> superblock and an EFI to free the old log.  Then you update the primary
> super and force it out to disk, un-quiesce the log, and finish that EFI
> so that the old log gets freed.
> 
> And then you have to go back and find the necessary parts that I missed.

The new log transaction to say "the new log is over there" so log
recovery knows that the old log is being replaced and can go find
the new log and recover it to free the old log.

IOWs, there's a heap of log recovery work needed, a new
intent/transaction type, futzing with feature bits because old
kernels won't be able to recovery such a operation, etc.

Then there's interesting issues that haven't ever been considered,
like having a discontiguity in the LSN as we physically switch logs.
What cycle number does the new log start at? What happens to all the
head and tail tracking fields when we switch to the new log? What
about all the log items in the AIL which is ordered by LSN? What
about all the active log items that track a specific LSN for
recovery integrity purposes (e.g. inode allocation buffers)? What
about updating the reservation grant heads that track log space
usage? Updating all the static size calculations used by the log
code which has to be done before the new log can be written to via
iclogs.

The allocation of the new log extent and the freeing of the old log
extent is the easy bit. Handling the failure cases to provide an
atomic, always recoverable switch and managing all the runtime state
and accounting changes that are necessary is the hard part...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx