Re: [PATCH] xfs: fix shared extent data corruption due to missing cow reservation

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 15 Nov 2018 11:10:54 -0500

On Thu, Nov 15, 2018 at 09:59:33AM -0600, Eric Sandeen wrote:
> 
> 
> On 11/15/18 9:58 AM, Brian Foster wrote:
> > On Thu, Nov 15, 2018 at 09:51:58AM -0600, Eric Sandeen wrote:
> >> On 11/13/18 11:08 AM, Brian Foster wrote:
> >>
> >>> For example, a buffered write occurs across the file offset (in FSB
> >>> units) range of [29, 57]. A shared extent exists at blocks [29, 35]
> >>> and COW reservation already exists at blocks [32, 34]. After
> >>> accommodating a COW extent size hint of 32 blocks and the existing
> >>> reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
> >>> blocks of reservation at offset 0 and returns with COW reservation
> >>> across the range of [0, 34]. The associated data fork extent is
> >>> still [29, 35], however, which isn't fully covered by the COW
> >>> reservation.
> >>>
> >>> This leads to a buffered write at file offset 35 over a shared
> >>> extent without associated COW reservation. Writeback eventually
> >>> kicks in, performs an overwrite of the underlying shared block and
> >>> causes the associated data corruption.
> >>
> >> Can you write this in the form of an xfstests reproducer please? :)
> >>
> > 
> > I'll add it to the todo list.
> 
> thanks, it doesn't seem like the kind of thing that will be hit too often
> at random, based on the struggles to reproduce it as first reported via
> fsstress.
> 

This reminds me that I wanted to look into DEBUG mode writeback time
detection of overwrites of shared extents. I think part of the
difficulty of reproducing it via shared/010 is that it required a cached
page over the particular shared block in another inode to detect the
corruption. If we can assert that overwrites are always !shared, that
removes that requirement and may avoid the need for a new test.

Brian

> -Eric