Hi all, Recently I've been digging into a corruption issue which I think is just about pinned, but I'd appreciate some more expert EXT4 eyes to confirm we're on the right path. What we have boils down to a system with - An ext4 filesystem with the journal disabled - A workload[0] which in a loop - Creates a lot of small files - Occasionally deletes these files and collects them into a single larger "compound" file - Checks the header of all of these files periodically to ensure they're correct After a while this check fails, and when inspecting the "bad" file, the contents of that file are actually an EXT4 extent structure, for example: [ec2-user@ip-172-31-0-206 ~]$ hexdump -C _2w.si 00000000 0a f3 05 00 54 01 00 00 00 00 00 00 00 00 00 00 |....T...........| 00000010 01 00 00 00 63 84 08 05 01 00 00 00 ff 01 00 00 |....c...........| 00000020 75 8a 1c 02 00 02 00 00 00 02 00 00 00 9c 1c 02 |u...............| 00000030 00 04 00 00 dc 00 00 00 00 ac 1c 02 dc 04 00 00 |................| 00000040 08 81 00 00 dc ac 1c 02 00 00 00 00 00 00 00 00 |................| 00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000170 00 00 00 |...| 00000173 This has EXT4_EXT_MAGIC (cpu_to_le16(0xf30a)), and when parsed as extent header plus array has 5 extent entries at 0 depth. By the time the file is checked, the file that these extents presumably pointed to appears to have been deleted, but reading the physical blocks looks like the data of one of the larger files this test creates. Based on that what I think is happening is - A file with separate (i.e. non-inline) extents is synced / written to disk (in this case, one of the large "compound" files) - ext4_end_io_end() kicks off writeback of extent metadata - AIUI this marks the related buffers dirty but does not wait on them in the no-journal case - The file is deleted, causing the extents to be "removed" and the blocks where they were stored are marked unused - A new file is created (any file, separate extents not required) - The new file is allocated the block that was just freed (the physical block where the old extents were located) Some time between this point and when the file is next read, the dirty extent buffer hits the disk instead of the intended data for the new file. A big-hammer hack in __ext4_handle_dirty_metadata() to always sync metadata blocks appears to avoid the issue but isn't ideal - most likely a better solution would be to ensure any dirty metadata buffers are synced before the inode is dropped. Overall does this summary sound valid, or have I wandered into the weeds somewhere? Cheers, Sam Mendoza-Jonas [0] This is an Elastisearch/Lucene workload, running the esrally tests to hit the issue.