On Fri 18-03-11 17:07:55, Darrick J. Wong wrote: > > > Ok, here's what I have so far. I took everyone's suggestions of where to add > > > calls to wait_on_page_writeback, which seems to handle the multiple-write case > > > adequately. Unfortunately, it is still possible to generate checksum errors by > > > scribbling furiously on a mmap'd region, even after adding the writeback wait > > > in the ext4 writepage function. Oddly, I couldn't break btrfs with mmap by > > > removing its wait_for_page_writeback call, so I suspect there's a bit more > > > going on in btrfs than I've been able to figure out. > > I wonder, is it possible for this to happen: > > 1. Thread A mmaps a page and tries to write to it. ext4_page_mkwrite executes, > but there's no ongoing writeback, so it returns without delay. > 2. Thread A starts writing furiously to the page. > 3. Thread B runs fsync() or something that results in the page being > checksummed and scheduled for writeout. > 4. Thread A continues to write furiously(!) on that same page before the > controller finishes the DMA transfer. > 5. Disk gets the page, which now doesn't match its checksum, and *boom* What happens on writepage (see mm/page-writeback.c:write_cache_pages()) is: lock_page(page) ... clear_page_dirty_for_io() - removes PageDirty, marks page as read-only in PTE ... set_page_writeback() (happens e.g. in __block_write_full_page() called from filesystem's writepage implementation). unlock_page(page) So if you compute the checksum after set_page_writeback() is done in the writepage() implementation (you cannot use __block_write_full_page() in that case) and you call wait_on_page_writeback() in ext4_page_mkwrite() under page lock, you should be safe. If you do all this and still see errors, something is broken I'd say... Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html