Re: [PATCH 2/2] xfs: don't dirty snapshot logs for unlinked inode recovery

Gao Xiang <hsiangkao@xxxxxxxxxx> · Tue, 23 Feb 2021 23:58:30 +0800

On Tue, Feb 23, 2021 at 09:46:38AM -0600, Eric Sandeen wrote:
> 
> 
> On 2/23/21 9:03 AM, Gao Xiang wrote:
> > On Tue, Feb 23, 2021 at 08:40:56AM -0600, Eric Sandeen wrote:
> >> On 2/23/21 7:42 AM, Gao Xiang wrote:
> >>> Hi folks,
> >>>
> >>> On Wed, Mar 28, 2018 at 08:17:28AM +1100, Dave Chinner wrote:
> >>>> On Mon, Mar 26, 2018 at 08:46:49AM -0400, Brian Foster wrote:
> >>>>> On Sat, Mar 24, 2018 at 09:20:49AM -0700, Darrick J. Wong wrote:
> >>>>>> On Wed, Mar 07, 2018 at 05:33:48PM -0600, Eric Sandeen wrote:
> >>>>>>> Now that unlinked inode recovery is done outside of
> >>>>>>> log recovery, there is no need to dirty the log on
> >>>>>>> snapshots just to handle unlinked inodes.  This means
> >>>>>>> that readonly snapshots can be mounted without requiring
> >>>>>>> -o ro,norecovery to avoid the log replay that can't happen
> >>>>>>> on a readonly block device.
> >>>>>>>
> >>>>>>> (unlinked inodes will just hang out in the agi buckets until
> >>>>>>> the next writable mount)
> >>>>>>
> >>>>>> FWIW I put these two in a test kernel to see what would happen and
> >>>>>> generic/311 failures popped up.  It looked like the _check_scratch_fs
> >>>>>> found incorrect block counts on the snapshot(?)
> >>>>>>
> >>>>>
> >>>>> Interesting. Just a wild guess, but perhaps it has something to do with
> >>>>> lazy sb accounting..? I see we call xfs_initialize_perag_data() when
> >>>>> mounting an unclean fs.
> >>>>
> >>>> The freeze is calls xfs_log_sbcount() which should update the
> >>>> superblock counters from the in-memory counters and write them to
> >>>> disk.
> >>>>
> >>>> If they are out, I'm guessing it's because the in-memory per-ag
> >>>> reservations are not being returned to the global pool before the
> >>>> in-memory counters are summed during a freeze....
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Dave.
> >>>> -- 
> >>>> Dave Chinner
> >>>> david@xxxxxxxxxxxxx
> >>>
> >>> I spend some time on tracking this problem. I've made a quick
> >>> modification with per-AG reservation and tested with generic/311
> >>> it seems fine. My current question is that how such fsfreezed
> >>> images (with clean mount) work with old kernels without [PATCH 1/1]?
> >>> I'm afraid orphan inodes won't be freed with such old kernels....
> >>> Am I missing something?
> >>
> >> It's true, a snapshot created with these patches will not have their unlinked
> >> inodes processed if mounted on an older kernel. I'm not sure how much of a
> >> problem that is; the filesystem is not inconsistent, but some space is lost,
> >> I guess. I'm not sure it's common to take a snapshot of a frozen filesystem on
> >> one kernel and then move it back to an older kernel.  Maybe others have
> >> thoughts on this.
> > 
> > My current thought might be only to write clean mount without
> > unlinked inodes when freezing, but leave log dirty if any
> > unlinked inodes exist as Brian mentioned before and don't
> > handle such case (?). I'd like to hear more comments about
> > this as well.
> 
> I don't know if I had made this comment before ;) but I feel like that's even
> more "surprise" (as in: gets further from the principle of least surprise)
> and TBH I would rather not have that somewhat unpredictable behavior.
> 

Yeah, I saw that comment as well....

> I think I'd rather /always/ make a dirty log than sometimes do it, other
> times not. It'd just be more confusion for the admin IMHO.

Ok, some other alternative approaches I could think out in my mind
aren't trivial (e.g. some hack on log recovery, etc).. Any ideas /
thoughts about this are welcomed :) Thanks!

Thanks,
Gao Xiang

> 
> Thanks,
> -Eric
> 
> > Thanks,
> > Gao Xiang
> > 
> >>
> >> -Eric
> >>
> > 
>