Re: After block device error, FICLONE and sync_file_range() make NULs, unlike read()

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Tue, 13 Dec 2022 11:20:36 -0800

On Fri, Dec 09, 2022 at 11:43:44PM -0800, Noah Misch wrote:
> On Mon, Nov 28, 2022 at 06:50:59PM -0800, Darrick J. Wong wrote:
> > On Sat, Nov 19, 2022 at 05:34:12PM -0800, Noah Misch wrote:
> > > On Tue, Nov 15, 2022 at 07:14:47PM -0800, Darrick J. Wong wrote:
> > > > On Wed, Nov 09, 2022 at 08:54:52PM -0800, Noah Misch wrote:
> > > > > Subject line has my typo: s/sync_file_range/copy_file_range/
> 
> > > > Another dumb thing about how the pagecache tracks errors is that it sets
> > > > a single state bit for the whole mapping, which means that we can't
> > > > actually /tell/ userspace which part of their file is now busted.  We
> > > > can't even tell if userspace has successfully rewrite()d all the regions
> > > > where writeback failed, which leads me to...
> > > > 
> > > > Another another dumb thing about how the pagecache tracks errors is that
> > > > any fsync-lik operation will test_and_clear_bit the EIO state, which
> > > > means that if we find a past EIO, we'll clear that state and return the
> > > > EIO to userspace.
> > > > 
> > > > We /could/ change FICLONE to flush the dirty pagecache, sample the EIO
> > > > status *without* clearing it, and return EIO if it's set.  That's
> > > > probably the most unabsurd way to deal with this, but it's unsettling
> > > > that even cp ignores errno returns now.  The manpage for FICLONE doesn't
> > > > explicitly mention any fsync behaviors, so perhaps "flush and retain
> > > > EIO" is the right choice here.
> > > 
> > > That reminds me of
> > > https://postgr.es/m//20180427222842.in2e4mibx45zdth5@xxxxxxxxxxxxxxxxx.  Its
> > > summary of a LSF/MM 2018 discussion mentioned NFS writeback errors detected
> > > and cleared at close(), which I find similar.  I might favor a uniform policy,
> > > one of:
> > > 
> > > a. Any syscall with a file descriptor argument might return EIO.  If it does,
> > >    it clears the EIO.
> > > b. Any syscall with a file descriptor argument might return EIO.  Only a
> > >    specific list of syscalls, having writeback-oriented names, clear EIO:
> > >    fsync(), syncfs(), [...].  Others report EIO without clearing it.
> > > 
> > > One argument for (b) is that, on EIO from FICLONE or copy_file_range(), the
> > > caller can't know whether the broken file is the source or the destination.  A
> > > cautious caller should assume both are broken.  What other considerations
> > > should influence the decision?
> > 
> > That's a very good point you've raised -- userspace can't associate an
> > EIO return value with either of the fds in use.  It can't even tell if
> > the filesystem itself hit some metadata error somewhere else (e.g.
> > refcount data), and that's the real reason why EIO got thrown back to
> > userspace.
> > 
> > On those grounds, I think FICLONE/FIEDEDUPE need to preserve the
> > AS_EIO/AS_ENOSPC state in the address_space so that actual fsync (or
> > syncfs, or any of the known 'persist me now' calls) can also return the
> > status.
> > 
> > I'll try to push that for 6.3.
> 
> That sounds good.  Thank you.  Please CC me on any threads you create for
> this, if not inconvenient.

Will do.

--D