[Bug 200739] I/O error on read-ahead inode blocks does not get detected or reported

bugzilla-daemon@xxxxxxxxxxxxxxxxxxx · Tue, 07 Aug 2018 03:27:06 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=200739

--- Comment #7 from Shehbaz (shehbazjaffer007@xxxxxxxxx) ---
Hello Theodore,

Thank you for your reply to this bug and other bugs.

>Does this actually cause an user-visible problem?   If we do readahead for an
>inode table block never gets used by the user, and that block is never used
>(perhaps because no inodes have been written using that inode table block),
>why should we mark the file system as corrupted?
I agree this does not cause a user-visible problem. I think we should atleast
warn the user about disk corruption because we did not receive the block that
we requested from the disk. the purpose of read-ahead was to read all blocks
ahead of the read block upto certain limit (2 blocks in my experiment). If the
blocks do not exist, then returning with 1 or 0 blocks is correct. If the
blocks exist but could not be read because of a media error, I believe this
should be reported to the user.

Especially given that with modern block devices, when we *do* write to the
inode table block, it will probably use redirect the failed sector to a spare
block replacement pool automatically, at which point subsequent reads to that
inode table block will be *fine*.
> I agree in case of writes, the newly written block would get redirected to a
> healthy sector. However, if it is a read-only workload, a proactive detection
> of a read I/O error should be handled imminently. For btrfs on HDDs, I see
> btrfs-scrub daemon being invoked immediately as soon as any form of
> corruption or I/O error is detected during the read operation. this replaces
> older metadata block with a duplicate copy. For ext4, I do not see any
> warning.

So prematurely deciding that just because an speculative, readahead access to a
sector returns a media error, is grounds to declare the file system corrupted
(which could force a reboot if errors=panic is set), seems to be a massive
overreaction.
Why do you think we should signal an error in this case?
> I am unsure if we should reboot due to read-ahead failure, since the current
> operation did not get affected due to failed read ahead block. However,
> either a warning message or e2fsck run recommendation should be provided (eg.
> structure needs cleaning) so that the user knows the media is not working
> correctly as file system could not read the data it intended to read (2
> read-ahead blocks in this case)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.