Re: [PATCH v2] ext4: set csum seed in tmp inode while migrating to extents

Lukas Czerner <lczerner@xxxxxxxxxx> · Fri, 17 Dec 2021 10:35:34 +0100

On Thu, Dec 16, 2021 at 01:32:14PM -0500, Theodore Ts'o wrote:
> On Wed, Dec 15, 2021 at 03:12:37PM +0100, Lukas Czerner wrote:
> > > Run fsck of course! And then recover from backups :) I know this is sad but
> > > the situation is that our migration code just is not crash-safe (if we
> > > crash we are going to free blocks that are still used by the migrated
> > > inode) and Luis makes it work in case we do not crash (which should be
> > > hopefully more common) and documents it does not work in case we crash.
> > > So overall I'd call it a win.
> > > 
> > > But maybe we should just remove this online-migration functionality
> > > completely from the kernel? That would be also a fine solution for me. I
> > > was thinking whether we could somehow make the inode migration crash-safe
> > > but I didn't think of anything which would not require on-disk format
> > > change...
> > 
> > Since this is not something that anyone can honestly recommend doing
> > without a prior backup and a word of warning I personaly would be in favor
> > of removing it.
> 
> So there are a couple options that we could pursue:
> 
> 1) We could change the migrate code to stop putting the orphan inode
> on the orphan list.  If we do this, an crash in the middle of the
> migrate will, in the worst case (when the migration isn't completed
> within a single jbd2 transaction) result in a leaked inode.  That's
> not ideal, but it won't lead user data loss, and e2fsck will recover
> the situation by cloning the blocks, and leaving the inode in
> lost+found.
> 
> 2) We could try to ensure migration happens all within a single
> transaction, if they all fit inside a the inode structure, we allocate
> a tmp inode for all of the indirect blocks, attach the blocks to the
> tmp inode, place the tmp inode on the orphan list, and put all of that
> on a single handle, and then in a second handle, truncate the tmp
> inode to release the indirect blocks.  If we need to allocate extent
> tree blocks, then all of that would need to fit in a single
> transaction, and it's a bit more complicated, but it is doable.
> 
> 3) We can simply remove the inode migration feature by removing
> EXT4_EXTENTS_FL from EXT4_FL_USER_MODIFIABLE, and changing the
> implementation of the EXT4_IOC_MIGRATE ioctl to return EOPNOTSUPP, and
> then cleaning up the code paths that are now unreachable.
> 
> The migration feature is clearly less compelling than it was ten years
> ago, when ext4 was first introduced --- and most enterprise distros
> have never supported the feature even when it has existed.  Also on
> the plus side, we've never shipped a program to globally migrate a
> file system by using ioctl interface.
> 
> On the other hand, there may have been user shell scripts that have
> done something like "find /mntpt -type f -print0 | xargs -0 chattr +e {} \;"
> And so option #3 could be construed as "breaking userspace", especially
> without a deprecation window.
> 
> Furthermore, Option #1 is pretty simple to implement, and chances of a
> migration getting spread across two jbd2 commits is not actually
> pretty low.  And if it does happen, there would only be a single inode
> that would get its blocks cloned and attached to lost+found.
> 
> Thats being said, if we *did* option #1, in the long run we'd want to
> land a complete solution, which would either be something like option
> #2, or allocating a flag to give a hint to e2fsprogs that if it does
> find an leaked inode with with the flag set on the on-disk inode, that
> all it needs to do is to zero out the inode and be done with it.
> 
> So the question is, is it worth it to continue supporting the migrate
> feature, or should we just delete all of the migration code, and risk
> users complaining that we've broken their use case?  The chances of
> that happening is admittedly low, and Linus's rule that "it's only
> breaking userspace if a user complains" means we might very well get
> away with it.  :-)

That's a very good summary Ted, thanks.

Our rationale behind not supporting the migration was always the fact
that we felt that backup was absolutely necessary before operation like
this. When you already have up-to-date backup available you might as
well create a fresh ext4 file system with all the advantages it brings
and recover data from said backup. I think this is still a very
reasonable approach.

I have no doubt it was useful featrure in the early days of ext4, but I
think we're well past that. Any attempt to rework or fix the feature
assumes it's still useful and has it's users today and into the future.
I very much doubt that, so let's test it. Let's start a year long
deprecation window.

-Lukas

> 
> 						- Ted
>