On Tue, Dec 10, 2019 at 08:23:40AM -0500, Brian Foster wrote: > generic/522 (fsx) occasionally fails with a file corruption due to > an insert range operation. The primary characteristic of the > corruption is a misplaced insert range operation that differs from > the requested target offset. The reason for this behavior is a race > between the extent shift sequence of an insert range and a COW > writeback completion that causes a front merge with the first extent > in the shift. How is the COW writeback completion modifying the extent list while an extent shift is modifying the extent list? Both should be running under XFS_ILOCK_EXCL contexts so there shouldn't be a race condition here unless we've screwed up the extent list modification atomicity... > > The shift preparation function flushes and unmaps from the target > offset of the operation to the end of the file to ensure no > modifications can be made and page cache is invalidated before file > data is shifted. An insert range operation then splits the extent at > the target offset, if necessary, and begins to shift the start > offset of each extent starting from the end of the file to the start > offset. The shift sequence operates at extent level and so depends > on the preparation sequence to guarantee no changes can be made to > the target range during the shift. Oh... shifting extents is not an atomic operation w.r.t. other inode modifications - both insert and collapse run individual modification transactions and lock/unlock the inode around each transaction. So, essentially, they aren't atomic when faced with other *metadata* modifications to the inode. > If the block immediately prior to > the target offset was dirty and shared, however, it can undergo > writeback and move from the COW fork to the data fork at any point > during the shift. If the block is contiguous with the block at the > start offset of the insert range, it can front merge and alter the > start offset of the extent. Once the shift sequence reaches the > target offset, it shifts based on the latest start offset and > silently changes the target offset of the operation and corrupts the > file. Yup, that's exactly the landmine that non-atomic, multi-transaction extent range operations have. It might be a COW operation, it might be something else that ends up manipulating the extent list. But because the ILOCK is not held across the entire extent shift, insert/collapse are susceptible to corruption when any other XFs code concurrently modifies the extent list. I think insert/collapse need to be converted to work like a truncate operation instead of a series on individual write operations. That is, they are a permanent transaction that locks the inode once and is rolled repeatedly until the entire extent listi modification is done and then the inode is unlocked. > To address this problem, update the shift preparation code to > stabilize the start boundary along with the full range of the > insert. Also update the existing corruption check to fail if any > extent is shifted with a start offset behind the target offset of > the insert range. This prevents insert from racing with COW > writeback completion and fails loudly in the event of an unexpected > extent shift. It looks ok to avoid this particular symptom (backportable point fix), but I really think we should convert insert/collapse to be atomic w.r.t other extent list modifications.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx