On Tue 24-08-21 13:14:09, Theodore Ts'o wrote: > I've been running some tests exercising the orphan_file code, and > there are a number of failures: > > ext4/orphan_file: 512 tests, 3 failures, 25 skipped, 7325 seconds > Failures: ext4/044 generic/475 generic/643 > ext4/orphan_file_1k: 524 tests, 6 failures, 37 skipped, 8361 seconds > Failures: ext4/033 ext4/044 ext4/045 generic/273 generic/476 generic/643 > > generic/643 is the iomap swap failure, and can be ignored. > generic/475 is a pre-existing test flake that involves simulated disk > failures, which we can also ignore in the context or orphan_file. > > However, ext4/044 is one that looks... interesting: > > root@kvm-xfstests:~# e2fsck -fn /dev/vdc > e2fsck 1.46.4-orphan-file (22-Aug-2021) > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > Orphan file (inode 12) block 0 is not clean. > Clear? no > > Failed to initialize orphan file. > Recreate? no > > This is highly reproducible, and involves using a file system config > that is probably a little unusual: > > Filesystem features: has_journal ext_attr resize_inode dir_index orphan_file filetype sparse_super large_file > > (This was created using "mke2fs -t ext3 -O orphan_file".) Interesting. I don't see how orphan handling code gets used at all for this test. Hrm. Actually it seems to be a bug in the tools themselves because just "mke2fs -t ext3 -O orphan_file" and "e2fsck -f" reproduces exactly this failure. It seems that when I was adding physical block number to orphan file block checksum, I've broken e2fsck for the situation when metadata_csum is disabled. I've fixed the bug now (relative diff attached, I can resend the full series once the other bugs are dealt with as well). > The orphan_file_1k failures seem to involve running out of space in > the orphan_file, and the fallback to using the old fashioned orphan > list seems to return ENOSPC? For example, from ext4/045: > > +mkdir: No space left on device > +Failed to create directories - 19679 > > ext4/045 creates a lot of directories when calls mkdir (ext4/045 tests > creating more than 65000 subdirectories in a directory), and so this > seems to be triggering a failure? Strange. I don't see how ext4/045 load could run out of space in the orphan file (and in fact I did test that the fallback when we run out of space in the orphan file works correctly). Anyway, I'll look into it. Thanks for the reports! Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR
diff --git a/e2fsck/super.c b/e2fsck/super.c index 6964e2ddae39..d1da2c16bb02 100644 --- a/e2fsck/super.c +++ b/e2fsck/super.c @@ -578,11 +578,9 @@ static int reinit_orphan_block(ext2_filsys fs, e2fsck_t ctx; blk64_t blk = *block_nr; struct problem_context pctx; - struct ext4_orphan_block_tail *tail; pd = priv_data; ctx = pd->ctx; - tail = ext2fs_orphan_block_tail(fs, pd->buf); /* Orphan file must not have holes */ if (!blk) { @@ -597,12 +595,18 @@ return_abort: pd->abort = 1; return BLOCK_ABORT; } - /* - * Update checksum to match expected buffer contents with appropriate - * block number. - */ - tail->ob_checksum = ext2fs_do_orphan_file_block_csum(fs, pd->ino, - pd->generation, blk, pd->buf); + + if (ext2fs_has_feature_metadata_csum(fs->super)) { + struct ext4_orphan_block_tail *tail; + + tail = ext2fs_orphan_block_tail(fs, pd->buf); + /* + * Update checksum to match expected buffer contents with + * appropriate block number. + */ + tail->ob_checksum = ext2fs_do_orphan_file_block_csum(fs, + pd->ino, pd->generation, blk, pd->buf); + } if (!pd->clear) { pd->errcode = io_channel_read_blk64(fs->io, blk, 1, pd->block_buf);