Re: [PATCH] e2fsprogs: Try again to solve unreliable io case

Zhiqiang Liu <liuzhiqiang26@xxxxxxxxxx> · Sat, 24 Apr 2021 12:46:17 +0800

On 2021/4/23 23:46, Theodore Ts'o wrote:
> On Fri, Apr 23, 2021 at 10:18:09AM +0800, Zhiqiang Liu wrote:
>> Thanks for your reply.
>> Actually, we have met the problem in ipsan situation.
>> When exec 'fsck -a <remote-device>', short-term fluctuations or
>> abnormalities may occur on the network. Despite the driver has
>> do the best effort, some IO errors may occur. So add retrying in
>> e2fsprogs can further improve the reliability of the repair
>> process.
> 
> But why doesn't this happen when the file system is mounted, and why
> is that acceptable?   And why not change the driver to do more retries?
> 
>    		      	      	  - Ted
> 
Actually, this may happen when the filesystem is mounted. The difference
is that the mounted filesystem can ensure the consistency with journal.

For example, if the IO error occurs when calling io_channel_write_byte()
to update superblock, the checksum may be not written to the disk successfully.
Then the checksum error will occur, and the filesystem cannot be
repaired with 'fsck -y|a|f'.

This situation has a very low probability. For improving the reliability of
the repair process, the retries in e2fsprogs may be necessary.

Regards
Zhiqiang Liu.

> .
>