Re: [PATCH 0/3] xfs: sparse inodes overlap end of filesystem

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Thu, 31 Oct 2024 15:13:40 -0700

On Thu, Oct 31, 2024 at 12:44:12PM +0100, Carlos Maiolino wrote:
> On Tue, Oct 29, 2024 at 11:14:18AM -0500, Eric Sandeen wrote:
> > On 10/23/24 9:51 PM, Dave Chinner wrote:
> > > There is one question that needs to be resolved in this patchset: if
> > > we take patch 2 to allow sparse inodes at the end of the AG, why
> > > would we need the change in patch 1? Indeed, at this point I have to
> > > ask why we even need the min/max agbno guidelines to the inode chunk
> > > allocation as we end up allowing any aligned location in the AG to
> > > be used by sparse inodes. i.e. if we take patch 2, then patch 1 is
> > > unnecessary and now we can remove a bunch of code (min/max_agbno
> > > constraints) from the allocator paths...
> > > 
> > > I'd prefer that we take the latter path: ignore the first patch.
> > > This results in more flexible behaviour, allows existing filesystems
> > > with this issue to work without needing xfs_repair to fix them, and
> > > we get to remove complexity from the code.
> > > 
> > > Thoughts?
> > 
> > For some reason I'm struggling to grasp some of the details here, so
> > maybe I can just throw out a "what I think should happen" type response.
> > 
> > A concern is that older xfs_repair binaries will continue to see
> > inodes in this region as corrupt, and throw them away, IIUC - even
> > if the kernel is updated to handle them properly.
> > 
> > Older xfs_repair could be encountered on rescue CDs/images, maybe
> > even in initramfs environments, by virt hosts managing guest filesystems,
> > etc.
> > 
> > So it seems to me that it would be worth it to prevent any new inode
> > allocations in this region going forward, even if we *can* make it work,
> > so that we won't continue to generate what looks like corruption to older
> > userspace.
> > 
> > That might not be the most "pure" upstream approach, but as a practical
> > matter I think it might be a better outcome for users and support
> > orgs... even if distros update kernels & userspace together, that does
> > not necessarily prevent older userspace from encountering a filesystem
> > with inodes in this range and trashing them.
> >
> 
> I'm inclined to agree with Eric here as preventing the sparse inodes to be
> allocated at the edge of the runt AG sounds the most reasonable approach to me.
> It just seems to me yet another corner case to deal with for very little benefit,
> i.e to enable a few extra inodes, on a FS that seems to be in life support
> regarding space for new inodes, whether it's a distro kernel or upstream kernel.
> 
> It kind of seem risky to me, to allow users to run a new kernel, allocate inodes
> there, fill those inodes with data, just to run a not yet ready xfs_repair, and
> discard everything there. Just seems like a possible data loss vector.

I agree.  I think we have to fix the fsck/validation code to allow
inode clusters that go right up to EOAG on a runt AG because current
kernels write out filesystems that way.  I also think we have to take
the other patch that prevents the inode allocator from creating a chunk
that crosses EOAG so that unpatched xfs_repairs won't trip over newer
filesystems.

> Unless - and I'm not sure how reasonable it is -, we first release a new
> xfsprogs, preventing xfs_repair to rip off those inodes, and later update the
> kernel. But this will end up on users hitting a -EFSCORRUPTED every attempt to
> allocate inodes from the FS edge.
> 
> How feasible would be to first prevent inodes to be allocated at the runt AG's
> edge, let it sink for a while, and once we have a fixed xfs_repair for some
> time, we then enable inode allocation on the edge, giving enough time for users
> to have a newer xfs_repair?
> 
> Again, I'm not sure it it does make sense at all, hopefully it does.

I don't think we can ever re-enable inode allocation on the edge no
matter how much time goes by.

--D

> 
> Carlos
>