On Mon, Jan 25, 2010 at 10:23 AM, Ric Wheeler <rwheeler@xxxxxxxxxx> wrote: > On 01/18/2010 06:33 PM, Anton Altaparmakov wrote: >> >> Hi, >> >> On 18 Jan 2010, at 14:00, Nick Piggin wrote: >> >>> >>> For write errors, you could also do block re-allocation, which would be >>> fun. >>> >> >> Yes it would. (-: >> >> FWIW, Windows does this with Microsoft's NTFS driver. When a write fails >> due to a bad block, the block is marked as bad (recorded in the bad cluster >> list and marked as allocated in the in-use bitmap so no-one tries to >> allocate it), a new block is allocated, inode metadata is updated to reflect >> the change in the logical to physical block map of the file the block >> belongs to, and the write is then re-tried to its new location. >> >> I have never bothered implementing it in NTFS on Linux partially because >> there doesn't seem any obvious way to do it inside the file system. I think >> the VFS and/or the block layer would have to offer help there in some way. >> What I mean for example is that if ->writepage fails then the failure is >> only detected inside the asynchronous i/o completion handler at which point >> the page is not locked any more, it is marked as being under writeback, and >> we are in IRQ context (or something) and thus it is not easy to see how we >> can from there get to doing all the above needed actions that require memory >> allocations, disk i/o, etc... I suppose a separate thread could do it where >> we just schedule the work to be done. But problem with that is that that >> work later on might fail so we can't simply pretend the block was written >> successfully yet we do not want to report an error or the upper layers would >> pick it up even though we hopefully will correct it in due course... >> >> Best regards, >> >> Anton >> > > For permanent write errors, I would expect any modern drive to do a sector > remapping internally. We should never need to track this kind of information > for any modern device that I know of (S-ATA, SAS, SSD's and raid arrays > should all handle this). > > Would not seem to be worth the complexity. > > Also keep in mind that retrying IO errors is not always a good thing - > devices retry failed IO multiple times internally. Adding additional retry > loops up the stack only makes our unavoidable IO error take much longer to > hit! > > Ric I thought write errors returned by modern drives (last 15 years) in general were caused by bad cables, controllers, power supplies, etc. When a media error is returned on write it indicated the spare sector area of the drive was full. Thus a media write error is a major error. I would think, if anything, we should turn the filesystem readonly upon a write media error. Not try to hide such a major problem. Greg -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html