Hi again Andreas: On Mon, Apr 25, 2011 at 3:45 PM, Curt Wohlgemuth <curtw@xxxxxxxxxx> wrote: > Hi Andreas: > > On Mon, Apr 25, 2011 at 3:40 PM, Andreas Dilger <adilger@xxxxxxxxx> wrote: >> On 2011-04-25, at 2:23 PM, Curt Wohlgemuth wrote: >>> In the bio completion routine, we should not be setting >>> PageUptodate at all -- it's set at sys_write() time, and is >>> unaffected by success/failure of the write to disk. >>> >>> This can cause a page corruption bug when >>> >>> block size < page size >>> >>> @@ -203,46 +203,29 @@ static void ext4_end_bio(struct bio *bio, int error) >>> - /* >>> - * If this is a partial write which happened to make >>> - * all buffers uptodate then we can optimize away a >>> - * bogus readpage() for the next read(). Here we >>> - * 'discover' whether the page went uptodate as a >>> - * result of this (potentially partial) write. >>> - */ >>> - if (!partial_write) >>> - SetPageUptodate(page); >>> - >> >> I think this is the important part of the code - if there is a read-after-write for a file that was written in "blocksize" units (blocksize < pagesize), does the page get set uptodate when all of the blocks have been written and/or the writing is at EOF? Otherwise, a read-after-write will always cause data to be fetched from disk needlessly, even though the uptodate information is already in cache. > > Hmm, that's a good question. I would kind of doubt that the page > would be marked uptodate when the final block was written, and this > might be what the code above was trying to do. It wasn't doing it > correctly :-), but it might have possibly avoided the extra read when > it there was no error. > > I'll look at this some more, and see if I can't test for your scenario > above. Perhaps at least checking that all BHs in the page are mapped > + uptodate => SetPageUptodate would not be out of line. My testing is now showing the read coming through after writing to the 4 blocks of a 4K file, using 1K blocksize. And it seems to me that this is taken care of in __block_commit_write(), which is called from all the .write_end callbacks for ext4, at least. Thanks, Curt > > Thanks, > Curt > > > >> >> Cheers, Andreas >> >> >> >> >> >> > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html