Re: ext2fs_link() corrupting a directory

Jan Kara <jack@xxxxxxx> · Thu, 30 Jan 2020 17:25:30 +0100

On Fri 17-01-20 11:29:27, Theodore Y. Ts'o wrote:
> On Fri, Jan 17, 2020 at 01:24:20PM +0100, Jan Kara wrote:
> > 
> > I was tracking down a filesystem corruption issue with one proposed
> > xfstests testcase. After some debugging I've found out that the problem
> > actually is not in the kernel but rather in e2fsck (libext2fs
> > respectively). The testcase deletes lost+found, e2fsck recreates it. But
> > after the testcase / is h-tree directory. So ext2fs_link() creates
> > lost+found in / and clears EXT4_INODE_INDEX flag. Now because the
> > filesystem has metadata checksums, clearing the index flag needs to also
> > rewrite all directory blocks with h-tree index blocks because the layout
> > now needs to be different.  ext2fs_link() actually tries to do this in its
> > link_proc() but if the space for new directory entry is found before we
> > walk all the h-tree index blocks, we terminate the iteration and some index
> > blocks remain unconverted resulting in checksum errors and other weirdness
> > later on.
> 
> Ouch.  Nice catch.
> 
> I suspect we have a similar problem when the kernel discovers that the
> htree is corrupted; it will clear EXT4_INODE_INDEX flag, since before
> how we added metadata checksum, it was safe to do that.  Given our
> choice of where the stash the checksum, in order to not decrease the
> fanout of htree directories, this is no no longer safe to do.
> 
> I think what we will need to do is to set an in-memory "fallback"
> flag, which simply ignores the index blocks, but doesn't actually clear
> the EXT4_INODE_INDEX flag in the case where metadata_csum is enabled.

Yeah, that might be a good fix for the kernel.

> > The question is how to best fix this. The easiest fix is to just make
> > link_proc() iterate through all directory blocks when it needs to do the
> > index blocks conversion. But this seems somewhat stupid. Also there's
> > another problem with clearing EXT4_INODE_INDEX in ext2fs_link() - if the
> > directory has more than 65000 subdirectories, clearing EXT4_INODE_INDEX is
> > not allowed because large directory link count handling is supported only
> > for EXT4_INODE_INDEX directories.
> > 
> > So what do we do with this? For e2fsck, we could just link the new entry
> > into the directory and force rehashing later. But ext2fs_link() can be
> > called also from other tools and it should be a self-contained API... Any
> > ideas? Should we just bite the bullet and implement ext2fs_link() for
> > h-tree dirs properly?
> 
> Yes, I think that's what we are going to need to do; we had cheated
> and not bothered to implement htree support in ext2fs library, but if
> we're going to implement it for link_proc, we might as well also
> implement it for lookups as well.

JFYI (since I won't make today's ext4 call) I have full-featured insertion
into indexed dirs implemented, just need to write some decent testing
because it's quite a bit of new code and not exactly trivial...

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR