Re: IO error semantics

Greg Freemyer <greg.freemyer@xxxxxxxxx> · Mon, 25 Jan 2010 11:15:50 -0500

On Mon, Jan 25, 2010 at 10:23 AM, Ric Wheeler <rwheeler@xxxxxxxxxx> wrote:
> On 01/18/2010 06:33 PM, Anton Altaparmakov wrote:
>>
>> Hi,
>>
>> On 18 Jan 2010, at 14:00, Nick Piggin wrote:
>>
>>>
>>> For write errors, you could also do block re-allocation, which would be
>>> fun.
>>>
>>
>> Yes it would.  (-:
>>
>> FWIW, Windows does this with Microsoft's NTFS driver.  When a write fails
>> due to a bad block, the block is marked as bad (recorded in the bad cluster
>> list and marked as allocated in the in-use bitmap so no-one tries to
>> allocate it), a new block is allocated, inode metadata is updated to reflect
>> the change in the logical to physical block map of the file the block
>> belongs to, and the write is then re-tried to its new location.
>>
>> I have never bothered implementing it in NTFS on Linux partially because
>> there doesn't seem any obvious way to do it inside the file system.  I think
>> the VFS and/or the block layer would have to offer help there in some way.
>>  What I mean for example is that if ->writepage fails then the failure is
>> only detected inside the asynchronous i/o completion handler at which point
>> the page is not locked any more, it is marked as being under writeback, and
>> we are in IRQ context (or something) and thus it is not easy to see how we
>> can from there get to doing all the above needed actions that require memory
>> allocations, disk i/o, etc...  I suppose a separate thread could do it where
>> we just schedule the work to be done.  But problem with that is that that
>> work later on might fail so we can't simply pretend the block was written
>> successfully yet we do not want to report an error or the upper layers would
>> pick it up even though we hopefully will correct it in due course...
>>
>> Best regards,
>>
>>        Anton
>>
>
> For permanent write errors, I would expect any modern drive to do a sector
> remapping internally. We should never need to track this kind of information
> for any modern device that I know of (S-ATA, SAS, SSD's and raid arrays
> should all handle this).
>
> Would not seem to be worth the complexity.
>
> Also keep in mind that retrying IO errors is not always a good thing -
> devices retry failed IO multiple times internally. Adding additional retry
> loops up the stack only makes our unavoidable IO error take much longer to
> hit!
>
> Ric

I thought write errors returned by modern drives (last 15 years) in
general were caused by bad cables, controllers, power supplies, etc.

When a media error is returned on write it indicated the spare sector
area of the drive was full.

Thus a media write error is a major error.  I would think, if
anything, we should turn the filesystem readonly upon a write media
error.  Not try to hide such a major problem.

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html