Re: Better read bio error granularity?

Qu Wenruo <quwenruo.btrfs@xxxxxxx> · Sun, 13 Mar 2022 19:03:39 +0800

On 2022/3/13 18:55, Matthew Wilcox wrote:
On Sun, Mar 13, 2022 at 06:24:32PM +0800, Qu Wenruo wrote:
Since if any of the split bio got an error, the whole bio will have
bi_status set to some error number.

This is completely fine for write bio, but I'm wondering can we get a
better granularity by introducing per-bvec bi_status or using page status?

One situation is, for fs like btrfs or dm device like dm-verify, if a
large bio is submitted, say a 128K one, and one of the page failed to
pass checksum/hmac verification.

Then the whole 128K will be marked error, while in fact the rest 124K
are completely fine.

Can this be solved by something like per-vec bi_status, or using page
error status to indicate where exactly the error is?

In general, I think we want to keep this simple; the BIO has an error.
If the user wants more fine granularity on the error, they can resubmit
a smaller I/O, or hopefully some day we get a better method of reporting
errors to the admin than "some random program got EIO".

Indeed this looks much simpler.

Specifically for the page cache (which I hope is what you meant by
"page error status", because we definitely can't use that for DIO),

Although what I exactly mean is PageError flag.

For DIO the pages are not mapping to any inode, but it shouldn't prevent
us from using PageError flag I guess?

the intent is that ->readahead can just fail and not set any of the
pages Uptodate.  Then we come along later, notice there's a page in
the cache and call ->readpage on it and get the error for that one
page.  The other 31 pages should read successfully.

This comes a small question, what is prevent the fs to submit a large
bio containing the 32 pages again, other than reading them page by page?

Just because of that page is there, but not Uptodate?

(There's an awkward queston to ask about large folios here, and what
we might be able to do around sub-folio or even sub-page or sub-block
reads that happen to not touch the bytes which are in an error region,
but let's keep the conversation about pages for now).

Yeah, that can go crazy pretty soon.

Like iomap or btrfs, they all use page::private to store extra bitmaps
for those cases, thus it really impossible to use PageError flag.
Thus I intentionally skip them here.

Thank you very much for the quick and helpful reply,
Qu