On Mon, Mar 29, 2021 at 01:05:55PM -0600, Andreas Dilger wrote: > >> I bet it wasn't done by design -- afaict all the recovery tools are > >> totally opportunistic in that /if/ they find something that looks like a > >> directory entry, /then/ it will pick that up. The names will eventually > >> get overwritten, so that's the best they can do. > >> > >> (I would also wager that people don't like opt-out for new behaviors > >> unless you're adding it as part of a new feature...) It certainly wasn't deliberate, and I'll note that we've already compromised recovery tools because of how deleted inodes were changed when we added journalling to ext3. Previously, we just zeroed i_links_count and then updated all of the block allocation bitmaps and block group descriptors' free block counts. But there's the possibility, especially for very big files, that all of the blocks that might need to be touched for a truncate might not fit in a single transaction. So we now handle the final iput of a deleted files by doing a journalled truncate, which means that all of the logical to physical mapping gets wiped after the delete is commited. You can see this if you execute the following series of commands: % mke2fs -F -q -t ext2 /tmp/foo.img 1G % sudo mount -o loop -t ext4 /tmp/foo.img /mnt % sudo cp /etc/motd /mnt/motd % sync % debugfs /tmp/foo.img -R 'stat <12>' % sudo rm /mnt/motd % sudo umount /mnt % debugfs /tmp/foo.img -R 'stat <12>' ... and compare the debugfs output before and after the file is deleted. Then try the same thing mounting with ext2 instead of ext4, with a kernel that has ext2 compiled in (as opposed to using ext4 in compatibility mode). This has been true ever since ext3 was released in 2001, and people have largely not noticed or complained. It would be possible to do better, if we wanted to better support the "Root Oops"[1] recovery case; we could do a two-pass scan over the file, determine how many block groups would need to be updated, and then try to open a handle with the requisite number of credits. If that succeeds, then we can skip zapping the extent tree or indirect blocks. Otherwise, we fall back to the current truncate code path. But it would slow down our performance of unlinks, and over the last two decades, as far as I know, no one has complained. I was the person who suggested using a mount option, but on reflection, given that we *already* pretty much make it impossible to recover the contents of a deleted file, do we really care about whether the file name can be recovered? Hence, I'm beginning to think that perhaps we shouldn't make it be a tunable at all. After all, zeroing the directory entry is not going to cause any kind of performance penalty, and if we add a mount option, it's one more thing mount option we need to support forever. If we really must make it be a tunable, a lighter weight to do this would be to assign a new EXT2_FLAGS_xxx bit in es->s_flags, and allow tune2fs to be able to adjust the field. But I think a strong case could be made that even that is overkill. Cheers, - Ted [1] A friend of mine back in my undergraduate days once took a scanned image of a Kellogs Froot Loops cereal box, and did some creative image editing to make it read "root oops", and replaced the text "Sweetened Multigrain Cereal" with "rm is forever". I should really ask her if she still has a copy of the image. :-)