On Mon, Jan 16, 2012 at 10:55:55AM -0800, Linus Torvalds wrote: > On Mon, Jan 16, 2012 at 8:01 AM, Jan Kara <jack@xxxxxxx> wrote: > > > > Hum, let me understand this. I understand the meaning of buffer_uptodate > > bit as "the buffer has at least as new content as what is on disk". Now > > when storage cannot write the block under the buffer, the contents of the > > buffer is still "at least as new as what is (was) on disk". > > No. > > Stop making crap up. Jan is right, Linus. His definition of what up-to-date means for dirty buffers is correct, especially in the case of write errors. > If the write fails, the buffer contents have *nothing* to do with what > is on disk. The dirty buffer contains what is *supposed* to be on disk. If we fail to write it, we corrupt some application's data. > You don't know what the disk contents are. But *we don't care* what is on disk after a write error because there is no guarantee that after a write error we can even read the previous data that was on disk. IOWs, the contents of the region on disk where the write failed is -undefined- and cannot be trusted. > So clearly the buffer cannot be up-to-date. What we have in memory is what is *supposed* to be on disk, and the error is telling us that the disk is failing to be made up-to-date. IOWs, the disk is stale after a write error, not what is in memory. So clearly the buffer contains the up-to-date version of the data after a write error. How the filesystem handles that error is now up to the filesystem. For example, the filesystem can chose to allocate new blocks for the failed write and write the valid, up-to-date in-memory data to a different location and continue onwards without errors. From this example, it's pretty obvious that the data in memory contains the data that what we need to care about after a write error, not what is on disk. > Now, feel free to use *other* arguments for why we shouldn't clear the > up-to-date bit, but using the disk contents as one is pure and utter > garbage. And it is *obviously* pure and utter garbage. For the read case you are correct, but that logic (that the disk version is always correct) does not apply to handling write errors. It's an important distinction.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html