Re: [LSF/MM TOPIC] do we really need PG_error at all?

Jeff Layton <jlayton@xxxxxxxxxx> · Sun, 26 Feb 2017 17:43:53 -0500

On Mon, 2017-02-27 at 08:03 +1100, NeilBrown wrote:
> On Sun, Feb 26 2017, James Bottomley wrote:
> 
> > [added linux-scsi and linux-block because this is part of our error
> > handling as well]
> > On Sun, 2017-02-26 at 09:42 -0500, Jeff Layton wrote:
> > > Proposing this as a LSF/MM TOPIC, but it may turn out to be me just 
> > > not understanding the semantics here.
> > > 
> > > As I was looking into -ENOSPC handling in cephfs, I noticed that
> > > PG_error is only ever tested in one place [1] 
> > > __filemap_fdatawait_range, which does this:
> > > 
> > > 	if (TestClearPageError(page))
> > > 		ret = -EIO;
> > > 
> > > This error code will override any AS_* error that was set in the
> > > mapping. Which makes me wonder...why don't we just set this error in 
> > > the mapping and not bother with a per-page flag? Could we potentially
> > > free up a page flag by eliminating this?
> > 
> > Note that currently the AS_* codes are only set for write errors not
> > for reads and we have no mapping error handling at all for swap pages,
> > but I'm sure this is fixable.
> 
> How is a read error different from a failure to set PG_uptodate?
> Does PG_error suppress retries?
> 

The thing is that we only ever TestClearError in write and fsync type
codepaths. Does it make sense to report read errors _at_all_ in fsync?

> > 
> > From the I/O layer point of view we take great pains to try to pinpoint
> > the error exactly to the sector.  We reflect this up by setting the
> > PG_error flag on the page where the error occurred.  If we only set the
> > error on the mapping, we lose that granularity, because the mapping is
> > mostly at the file level (or VMA level for anon pages).
> 
> Are you saying that the IO layer finds the page in the bi_io_vec and
> explicitly sets PG_error, rather than just passing an error indication
> to bi_end_io ??  That would seem to be wrong as the page may not be in
> the page cache. So I guess I misunderstand you.
> 
> > 
> > So I think the question for filesystem people from us would be do you
> > care about this accuracy?  If it's OK just to know an error occurred
> > somewhere in this file, then perhaps we don't need it.
> 
> I had always assumed that a bio would either succeed or fail, and that
> no finer granularity could be available.
> 
> I think the question here is: Do filesystems need the pagecache to
> record which pages have seen an IO error?
> I think that for write errors, there is no value in recording
> block-oriented error status - only file-oriented status.
> For read errors, it might if help to avoid indefinite read retries, but
> I don't know the code well enough to be sure if this is an issue.
> 

Yeah, it might be useful for preventing failing read retries, but I
don't see that it's being used in that way today, unless I'm missing
something.

If PG_error is ultimately needed, I'd like to have some more clearly
defined semantics for it. It's sprinkled all over the place today, and
it's not clear to me that it's being used correctly everywhere.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>