On Thu, Jan 14, 2021 at 5:53 AM Theodore Ts'o <tytso@xxxxxxx> wrote: > > On Tue, Jan 05, 2021 at 02:28:57PM +0800, yangerkun wrote: > > We got a "deleted inode referenced" warning cross our fsstress test. The > > bug can be reproduced easily with following steps: > > > > cd /dev/shm > > mkdir test/ > > fallocate -l 128M img > > mkfs.ext4 -b 1024 img > > mount img test/ > > dd if=/dev/zero of=test/foo bs=1M count=128 > > mkdir test/dir/ && cd test/dir/ > > for ((i=0;i<1000;i++)); do touch file$i; done # consume all block > > cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD, > > /dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in > > ext4_rename will return ENOSPC!! > > cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1 > > We will get the output: > > "ls: cannot access 'test/dir/file1': Structure needs cleaning" > > and the dmesg show: > > "EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls: > > deleted inode referenced: 139" > > > > ext4_rename will create a special inode for whiteout and use this 'ino' > > to replace the source file's dir entry 'ino'. Once error happens > > latter(the error above was the ENOSPC return from ext4_add_entry in > > ext4_rename since all space has been consumed), the cleanup do drop the > > nlink for whiteout, but forget to restore 'ino' with source file. This > > will trigger the bug describle as above. > > > > Signed-off-by: yangerkun <yangerkun@xxxxxxxxxx> > Apropos RENAME_WHITEOUT, it seems to be missing __ext4_fc_track_link(). I guess test coverage of RENAME_WHITEOUT in fstests is not much. I have been seeing trickles of bug fixes for RENAME_WHITEOUT for almost every filesystem that supports it. But I must say it would have been very hard to catch missing ext4_fc_track_* without specialized fs fuzzer such as the CrashMonkey generated tests. And as long as I am ranting, I'd like to point out that it is a shame that whiteout was not implemented as a special (constant) inode whose nlink is irrelevant (or a special dirent with d_ino 0 and d_type DT_WHT for that matter). It would have been a rather small RO_COMPAT on-disk change for ext4. It could also be implemented in slightly more backward compat manner by maintaining a valid nlink and postpone setting the RO_COMPAT flag until EXT4_LINK_MAX is reached. As things stand now, overlayfs makes an effort to maintain a singleton hardlinked whiteout inode, without being able to use it with RENAME_WHITEOUT and filesystems have to take special care to journal the metadata of all individual whiteout inodes, without any added value to the only user (overlayfs). But I guess that train has left the station long ago... Thanks, Amir.