Re: [PATCH 1/2] iomap: use page dirty state to seek data over unwritten extents

Brian Foster <bfoster@xxxxxxxxxx> · Wed, 14 Oct 2020 08:59:55 -0400

On Wed, Oct 14, 2020 at 09:53:44AM +1100, Dave Chinner wrote:
> On Mon, Oct 12, 2020 at 10:03:49AM -0400, Brian Foster wrote:
> > iomap seek hole/data currently uses page Uptodate state to track
> > data over unwritten extents. This is odd and unpredictable in that
> > the existence of clean pages changes behavior. For example:
> > 
> >   $ xfs_io -fc "falloc 0 32k" -c "seek -d 0" \
> > 	    -c "pread 16k 4k" -c "seek -d 0" /mnt/file
> >   Whence  Result
> >   DATA    EOF
> >   ...
> >   Whence  Result
> >   DATA    16384
> 
> I don't think there is any way around this, because the page cache
> lookup done by the seek hole/data code is an
> unlocked operation and can race with other IO and operations. That
> is, seek does not take IO serialisation locks at all so
> read/write/page faults/fallocate/etc all run concurrently with it...
> 
> i.e. we get an iomap that is current at the time the iomap_begin()
> call is made, but we don't hold any locks to stabilise that extent
> range while we do a page cache traversal looking for cached data.
> That means any region of the unwritten iomap can change state while
> we are running the page cache seek.
> 

Hm, Ok.. that makes sense..

> We cannot determine what the data contents without major overhead,
> and if we are seeking over a large unwritten extent covered by clean
> pages that then gets partially written synchronously by another
> concurrent write IO then we might trip across clean uptodate pages
> with real data in them by the time the page cache scan gets to it.
> 
> Hence the only thing we are looking at here is whether there is data
> present in the cache or not. As such, I think assuming that only
> dirty/writeback pages contain actual user data in a seek data/hole
> operation is a fundametnally incorrect premise.
> 

... but afaict this kind of thing is already possible because nothing
stops a subsequently cleaned page (i.e., dirtied and written back) from
also being dropped from cache before the scan finds it. IOW, I don't
really see how this justifies using one page state check over another as
opposed to pointing out the whole page scanning thing itself seems to be
racy. Perhaps the reasoning wrt to seek is simply that we should either
see one state (hole) or the next (data) and we don't terribly care much
about seek being racy..?

My concern is more the issue described by patch 2. Note that patch 2
doesn't necessarily depend on this one. The tradeoff without patch 1 is
just that we'd explicitly zero and dirty any uptodate new EOF page as
opposed to a page that was already dirty (or writeback).

Truncate does hold iolock/mmaplock, but ISTM that is still not
sufficient because of the same page reclaim issue mentioned above. E.g.,
a truncate down lands on a dirty page over an unwritten block,
iomap_truncate_page() receives the unwritten mapping, page is flushed
and reclaimed (changing block state), iomap_truncate_page() (still using
the unwritten mapping) has nothing to do without a page and thus stale
data is exposed.

ISTM that either the filesystem needs to be more involved with the
stabilization of unwritten mappings in general or truncate page needs to
do something along the lines of block_truncate_page() (which we used
pre-iomap) and just explicitly zero/dirty the new page if the block is
otherwise mapped. Thoughts? Other ideas?

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
>