On Sun, Feb 26 2017, James Bottomley wrote: > [added linux-scsi and linux-block because this is part of our error > handling as well] > On Sun, 2017-02-26 at 09:42 -0500, Jeff Layton wrote: >> Proposing this as a LSF/MM TOPIC, but it may turn out to be me just >> not understanding the semantics here. >> >> As I was looking into -ENOSPC handling in cephfs, I noticed that >> PG_error is only ever tested in one place [1] >> __filemap_fdatawait_range, which does this: >> >> if (TestClearPageError(page)) >> ret = -EIO; >> >> This error code will override any AS_* error that was set in the >> mapping. Which makes me wonder...why don't we just set this error in >> the mapping and not bother with a per-page flag? Could we potentially >> free up a page flag by eliminating this? > > Note that currently the AS_* codes are only set for write errors not > for reads and we have no mapping error handling at all for swap pages, > but I'm sure this is fixable. How is a read error different from a failure to set PG_uptodate? Does PG_error suppress retries? > > From the I/O layer point of view we take great pains to try to pinpoint > the error exactly to the sector. We reflect this up by setting the > PG_error flag on the page where the error occurred. If we only set the > error on the mapping, we lose that granularity, because the mapping is > mostly at the file level (or VMA level for anon pages). Are you saying that the IO layer finds the page in the bi_io_vec and explicitly sets PG_error, rather than just passing an error indication to bi_end_io ?? That would seem to be wrong as the page may not be in the page cache. So I guess I misunderstand you. > > So I think the question for filesystem people from us would be do you > care about this accuracy? If it's OK just to know an error occurred > somewhere in this file, then perhaps we don't need it. I had always assumed that a bio would either succeed or fail, and that no finer granularity could be available. I think the question here is: Do filesystems need the pagecache to record which pages have seen an IO error? I think that for write errors, there is no value in recording block-oriented error status - only file-oriented status. For read errors, it might if help to avoid indefinite read retries, but I don't know the code well enough to be sure if this is an issue. NeilBrown > > James > >> The main argument I could see for keeping it is that removing it >> might subtly change the behavior of sync_file_range if you have tasks >> syncing different ranges in a file concurrently. I'm not sure if that >> would break any guarantees though. >> >> Even if we do need it, I think we might need some cleanup here >> anyway. A lot of readpage operations end up setting that flag when >> they hit an error. Isn't it wrong to return an error on fsync, just >> because we had a read error somewhere in the file in a range that was >> never dirtied? >> >> -- >> [1]: there is another place in f2fs, but it's more or less equivalent >> to the call site in __filemap_fdatawait_range. >>
Attachment:
signature.asc
Description: PGP signature