Re: After block device error, FICLONE and sync_file_range() make NULs, unlike read()

Noah Misch <noah@xxxxxxxxxxxx> · Fri, 9 Dec 2022 23:43:44 -0800

On Mon, Nov 28, 2022 at 06:50:59PM -0800, Darrick J. Wong wrote:
> On Sat, Nov 19, 2022 at 05:34:12PM -0800, Noah Misch wrote:
> > On Tue, Nov 15, 2022 at 07:14:47PM -0800, Darrick J. Wong wrote:
> > > On Wed, Nov 09, 2022 at 08:54:52PM -0800, Noah Misch wrote:
> > > > Subject line has my typo: s/sync_file_range/copy_file_range/

> > > Another dumb thing about how the pagecache tracks errors is that it sets
> > > a single state bit for the whole mapping, which means that we can't
> > > actually /tell/ userspace which part of their file is now busted.  We
> > > can't even tell if userspace has successfully rewrite()d all the regions
> > > where writeback failed, which leads me to...
> > > 
> > > Another another dumb thing about how the pagecache tracks errors is that
> > > any fsync-lik operation will test_and_clear_bit the EIO state, which
> > > means that if we find a past EIO, we'll clear that state and return the
> > > EIO to userspace.
> > > 
> > > We /could/ change FICLONE to flush the dirty pagecache, sample the EIO
> > > status *without* clearing it, and return EIO if it's set.  That's
> > > probably the most unabsurd way to deal with this, but it's unsettling
> > > that even cp ignores errno returns now.  The manpage for FICLONE doesn't
> > > explicitly mention any fsync behaviors, so perhaps "flush and retain
> > > EIO" is the right choice here.
> > 
> > That reminds me of
> > https://postgr.es/m//20180427222842.in2e4mibx45zdth5@xxxxxxxxxxxxxxxxx.  Its
> > summary of a LSF/MM 2018 discussion mentioned NFS writeback errors detected
> > and cleared at close(), which I find similar.  I might favor a uniform policy,
> > one of:
> > 
> > a. Any syscall with a file descriptor argument might return EIO.  If it does,
> >    it clears the EIO.
> > b. Any syscall with a file descriptor argument might return EIO.  Only a
> >    specific list of syscalls, having writeback-oriented names, clear EIO:
> >    fsync(), syncfs(), [...].  Others report EIO without clearing it.
> > 
> > One argument for (b) is that, on EIO from FICLONE or copy_file_range(), the
> > caller can't know whether the broken file is the source or the destination.  A
> > cautious caller should assume both are broken.  What other considerations
> > should influence the decision?
> 
> That's a very good point you've raised -- userspace can't associate an
> EIO return value with either of the fds in use.  It can't even tell if
> the filesystem itself hit some metadata error somewhere else (e.g.
> refcount data), and that's the real reason why EIO got thrown back to
> userspace.
> 
> On those grounds, I think FICLONE/FIEDEDUPE need to preserve the
> AS_EIO/AS_ENOSPC state in the address_space so that actual fsync (or
> syncfs, or any of the known 'persist me now' calls) can also return the
> status.
> 
> I'll try to push that for 6.3.

That sounds good.  Thank you.  Please CC me on any threads you create for
this, if not inconvenient.