Re: [PATCH v2 5/5] xfs: revalidate imap properly before writeback delalloc conversion

Brian Foster <bfoster@xxxxxxxxxx> · Mon, 21 Jan 2019 12:42:56 -0500

On Mon, Jan 21, 2019 at 07:48:33AM -0800, Christoph Hellwig wrote:
> On Fri, Jan 18, 2019 at 01:39:15PM -0500, Brian Foster wrote:
> > This doesn't really seem all that different to me. Rather than firm up
> > the range in the caller, we turn XFS_BMAPI_DELALLOC into something that
> > seemingly behaves a bit more like CONVERT_ONLY.
> 
> The differences are:
> 
>  (a) we save one lookup in the extent tree
>  (b) we have a less fragile API
> 
> > A few notes/thoughts:
> > 
> > 1. What's the purpose of the nimaps == 0 check in
> > xfs_iomap_write_allocate()? If we got to this point with the page lock
> > held, shouldn't we always expect to find something backing the current
> > page?
> 
> It protects against the case where we migrate a COW fork mapping to the
> data fork, which is not protected by the page lock.  But I guess the
> check warrants a comment and an assert.
> 

Yeah, probably. It's not really clear to me what that means.

> > 2. If so, then it also seems that the whole "eof:" thing in
> > xfs_map_blocks() should never happen for data forks. If that's the case,
> > the use of the eof label there seems gratuitous.
> 
> Let me try with asserts enabled.
> 

Ok, thanks.

> > 
> > 3. If the starting bno param to xfs_bmapi_write() lands in a hole (due
> > to racing with a hole punch for example) and it finds some subsequent
> > delalloc extent to convert in the requested range, the arithmetic in
> > xfs_bmapi_trim_map() actually fabricates physical blocks for the hole
> > portion of the range relative to the startblock of the converted
> > delalloc extent. I don't think that causes a problem with the writeback
> > code because the fabricated blocks are before the page offset, but those
> > semantics are insane. I think you need to do something like fix up
> > xfs_bmapi_write() to return the actual converted mapping and make sure
> > xfs_iomap_write_allocate() handles that it might start beyond
> > map_start_fsb.
> 
> Sounds good.  To be honest I think the whole idea of converting the
> mapping before the requested offset is a rather bad idea to start
> with, just increasin our window that exposes stale data.  But to fix
> this properly we need to create everything as unwritten first, and
> I didn't have time to get back to that series.

Eh, I think that's a separate problem that re: your other series, we
already know how to fix properly. I don't think we should mess around
with something as fairly fundamental as delayed block allocation
behavior for an approach that doesn't address the underlying problem.

Brian