Re: [PATCH -next] ext4: avoid remove directory when directory is corrupted

Jan Kara <jack@xxxxxxx> · Fri, 24 Jun 2022 15:34:29 +0200

On Thu 23-06-22 11:01:58, Andreas Dilger wrote:
> On Jun 22, 2022, at 3:02 AM, Ye Bin <yebin10@xxxxxxxxxx> wrote:
> > 
> > Now if check directoy entry is corrupted, ext4_empty_dir may return true
> > then directory will be removed when file system mounted with "errors=continue".
> > In order not to make things worse just return false when directory is corrupted.
> 
> This will make corrupted directories undeletable, which might cause problems
> for applications also (e.g. tar or rsync always hitting errors when walking
> the tree) and the user may prefer to delete the directory and recreate it
> rather than having a permanent error in the filesystem.

Well, I guess an argument could be made that in such case users should
rather run e2fsck and *that* should remove the error from the filesystem.
It isn't like we allow other metadata corruptions to be papered over by
hiding them. I know we have this policy "corrupted dirs can be deleted"
since basically forever but in retrospection it does not seem particularly
good one to me.

> With your patch it would always return "false" if a directory block hits a
> corrupted entry instead of checking the rest of the blocks in the directory.
> Since e2fsck would put the entries from the broken block into lost+found,
> it isn't clear that the full/empty decision should be made by the presence
> of a corrupted leaf block either way.
> 
> Looking at the ext4_empty_dir() code, it looks like there are a few cases
> where it might return "true" when the directory actually has entries in it,
> but your patch doesn't address those.  IMHO, errors like the absence of "."
> and ".." should *NOT* cause the directory to be marked "empty", but it should
> continue checking blocks to see if there are valid entries.  However, Jan
> added these checks in 64d4ce8923 ("ext4: fix ext4_empty_dir() for directories
> with holes") to avoid looping forever when i_size is large and there are no
> allocated blocks in the directory, so they shouldn't just be removed, but
> they also do not fix the problem if i_size is corrupt but the first block of
> the inode is valid.
> 
> 
> It might make sense to change ext4_empty_dir() to iterate only leaf blocks
> actually allocated in the inode, rather than walking the whole of i_size by
> offset?  That would avoid the "spin forever on a huge sparse inode" problem
> that was the original reason for the addition of "." and ".." checks, and
> give a better determination of whether the directory is actually empty.
> 
> If there are only corrupt blocks or holes in the directory there is no reason
> *not* to delete it, but if there *are* valid entries (even if "." or ".." are
> missing) then the directory should not be deletable, since e2fsck will repair
> missing "." and ".." without clobbering the whole directory.

So I agree this would be a sane option as well but honestly I'm not sure
the complications are worth it. IMHO "corrupted dir is undeletable" is OK
policy because simple things are harder to break ;)...

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR